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Preface 


Structure in Protein Chemistry is designed for a senior 
undergraduate or graduate course covering the struc- 
tures of proteins and biophysical chemistry. The course 
created by this textbook is intended to bridge the gap 
between the research literature and the courses in intro- 
ductory chemistry and biochemistry that the student has 
already taken. There are suggested readings at the end of 
each section. In these selected publications, the concepts 
just discussed in that section are applied in an experi- 
mental setting. There are also more than 4800 citations 
within the text itself that should direct the student to the 
scientific literature. The format of the book is intended to 
resemble that of a biochemical journal to ease the transi- 
tion. At the completion of the course, the student should 
be equipped to take charge of his own education by crit- 
ically reading the biochemical literature on his own. To 
do this he must be able to understand the experiments 
performed and be able to reach the same conclusions as 
do the authors of each publication or to realize that the 
authors are mistaken in their interpretations. It is my 
intention to develop in the student the ability to draw his 
own conclusions from only the experimental results. To 
this end, there are problems after most of the sections to 
reinforce the concepts that have just been presented in 
the text. These problems are usually based on actual 
experimental results, which are to be evaluated by the 
student, ideally in the absence of assistance or misdirec- 
tion from the authors of the publications from which the 
results were taken. 

Refined crystallographic molecular models provide 
most of our knowledge of the structures of proteins. 
Their importance and validity are self evident, and they 
provide the foundation on which almost all of the other 
experimental observations in the field must rest. They 
also create, in the imagination of the chemist, a reliable 
abstract image of what the structure of a protein con- 
sists—its atomic details, its folded polypeptide back- 
bone, its a@helices and structure, the packing of its 
secondary structure, its globular or elongated shape, its 
irregular surface, its hydration, and the symmetric 
arrangement of its subunits. This abstract image of a 
molecule of generic protein is synthesized by her imagi- 
nation from all of the particular crystallographic molec- 
ular models she has viewed. Its fully developed mental 
existence permits her to understand the molecular basis 
of all of the other physical and chemical observations 
that are made of proteins and thus what these observa- 


tions actually mean. The abstract image also permits her 
to understand more clearly the evolution of proteins, the 
folding of proteins, and the assembly of oligomeric and 
polymeric proteins. Consequently, crystallographic 
molecular models of proteins must be discussed as soon 
as possible and as comprehensibly as possible in any 
successful presentation of the biophysical chemistry of 
proteins. 

Structure in Protein Chemistry begins with descrip- 
tions of how proteins are purified to provide the student 
with an understanding of where the proteins themselves 
and their crystals come from. To permit him to recognize 
intimately the polypeptide that folds to produce the crys- 
tallographic map of electron density, the electronic and 
atomic details of its covalent bonds are then described, 
and the methods for elucidating its sequence of amino 
acids and defining its posttranslational modifications are 
explained. A comprehensive presentation of the 
methods of crystallography, which permits the student 
to understand critically its strengths and weaknesses, 
and a thermodynamic discussion of the properties of 
noncovalent forces—ionic interactions, hydrogen bond- 
ing, and the hydrophobic effect—as they are expressed in 
aqueous solution are a prelude to an exhaustive descrip- 
tion of the atomic details of the structures of proteins as 
observed in crystallographic molecular models. The 
resulting understanding of their molecular structures at 
the atomic level and the noncovalent forces that produce 
those structures forms the basis for discussions of the 
evolution of proteins, of the symmetry of the oligomeric 
and polymeric associations that produce them, and of 
the chemical, mathematical, and physical basis of the 
techniques used to study their structures such as image 
reconstruction, nuclear magnetic resonance spec- 
troscopy, proton exchange, optical spectroscopy, 
electrophoresis, covalent cross-linking, chemical modifi- 
cation, immunochemistry, hydrodynamics, and the 
scattering of light, X-radiation, and neutrons. The appli- 
cation of these procedures to the study of the folding of 
polypeptides and the assembly of oligomers and helical 
polymers is then described. Finally, biological 
membranes and the structures of their proteins are 
discussed. 

To present a comprehensive view of the biophysical 
chemistry of proteins, this text combines concepts of 
bonding and chemical reactivity, descriptions of macro- 
molecular structure, principles of thermodynamics, and 


viii Preface 


explanations of biophysical methods and their results. 
The concepts of bonding and chemical reactivity are pre- 
sented in standard structural drawings of individual 
molecules or chemical reactions in which electronic and 
mechanistic aspects are emphasized as they are in 
courses in organic chemistry. The descriptions of macro- 
molecular structure are illustrated with stereo images of 
crystallographic molecular models that are drawn by the 
author so that details appropriate to the particular points 
made in the text are emphasized by choosing the appro- 
priate views of the structures. The principles of chemical 
thermodynamics are applied in relationships among the 
equilibrium constants and fundamental state functions. 
The explanations of biophysical methods rely on the 
mathematical equations defining the physical properties 
being measured. The results of the experiments them- 
selves are found in graphs and tables derived from the 
experimental literature. It is this combination of chemi- 
cal drawings, stereo images, mathematical equations, 
graphs, and tables that makes this book both unique and 
comprehensive. It also places severe demands on the 
student. She must have a firm background in physics, 
mathematics, analytical chemistry, organic chemistry, 
and physical chemistry to understand the material. In 
the broadest sense the intention of the course is to 
educate protein chemists. A protein chemist should be 
able to evaluate critically the results of any of the 
methods applied to the study of proteins. 

The foregoing describes both the First Edition and 
the Second Edition of Structure in Protein Chemistry but 
the Second Edition is a major revision of the first. All of 
the sections in each of the chapters in the Second Edition 
of Structure in Protein Chemistry have been updated 
extensively to include the relevant observations and new 
discoveries in the field that have been made since the 
First Edition was written. 

The significant progress that has been made since 
that time has required that some sections of the book be 
completely rewritten. For example, because of the explo- 
sion of knowledge in the area of protein folding, the 
section on the kinetics of folding has been completely 
redone. Likewise, there has been a dramatic increase in 
the number of crystallographic molecular models of 
oligomeric proteins so that examples are now available of 
all of the point groups for the symmetric assembly of 
asymmetric objects. As a result, because oligomeric pro- 
teins and isometric oligomeric proteins can now be dis- 
cussed more systematically, the sections covering their 
structures have also been completely reorganized and 
rewritten, and stereo drawings of crystallographic molec- 
ular models of proteins representing each point group 
are included. 

Completely new sections have also been added to 
the book. A new section on the structural details of the 
interactions between proteins and nucleic acids has 
been added, in part to recognize the significant progress 
that has been made in this area. The explosion of new 


crystallographic molecular models over the last two 
decades has included many with heterologous 
oligomeric associations where few were available at the 
time that the First Edition was being written. 
Consequently, a new section discussing oligomeric pro- 
teins that are constructed heterologously has been 
added. As part of this section, the major classes of these 
proteins are discussed, including proteins involved in 
cellular control, motility, the cytoskeleton, the extracel- 
lular matrix, cellular adhesion, and cell-cell interactions. 
There is also a completely new section on the roles of 
metallic cations in the structures of proteins. 

There are other instances in which major advances 
have led to extensive additions to the text. Descriptions 
and drawings of the crystallographic molecular models 
of representatives of the various classes of integral mem- 
brane-bound proteins, which were mostly unavailable 
for the First Edition, have been added. There is now a 
comprehensive description of mass spectrometry and its 
application to the direct sequencing of proteins, the elu- 
cidation of the structures of posttranslational modifica- 
tions, and the determination of the molar masses of 
proteins. The section on sequencing and modifying DNA 
has been extensively expanded to include developments 
in this rapidly advancing area. The number of posttrans- 
lational modifications included in the section covering 
this topic has been significantly increased, a reflection of 
the new discoveries in this area. In particular, the 
recently elucidated role of inteins in the posttransla- 
tional rearrangements of the polypeptide backbone is 
described. There is a new discussion of the results of 
crystallographic molecular models of atomic resolution 
(Bragg spacing less than 0.1 nm) because many of these 
have also become available since the First Edition was 
written. The section on hydrogen bonding in proteins 
has been significantly improved by including the results 
of double mutant cycles, a procedure that has been 
developed since the First Edition was written. How the 
most widely used algorithms for searching data banks of 
amino acid sequences work is described. There is a new, 
detailed discussion on how an icosahedral assembly is 
expanded by incorporating segments of a hexagonal 
array, which is the strategy that viruses have used to 
increase the size of their coats. The use of physical meas- 
urements of a protein in solution to adjust its crystallo- 
graphic molecular model, also a new development, is 
now discussed in the context of comprehensive descrip- 
tions of the techniques that are used to make these 
adjustments. For example, scattering curves from solu- 
tions of a protein are now used to adjust its crystallo- 
graphic molecular model to the structure that it assumes 
when it is in solution. 

In several instances, descriptions of procedures 
have been made more comprehensive to improve the 
student’s understanding. The section on nuclear mag- 
netic resonance has been significantly updated to 
describe the improvements that have been made in this 


field since the First Edition, but the physical basis and the 
techniques of nuclear magnetic resonance spectroscopy 
itself are now more comprehensively discussed so that a 
more complete understanding of the method is gained. 
The limited description of electron paramagnetic reso- 
nance spectroscopy in the First Edition has been 
expanded to create a new section in which examples of 
its recent use are presented. The use of image recon- 
struction and cryo-electron microscopy to produce 
structures of helical polymeric proteins and membrane- 
bound proteins is more comprehensively discussed than 
it was in the First Edition. 

All of these changes together have created a text 
that is not only an update but also a significant expansion 
of the First Edition. 

It is a pleasure to thank everyone who has helped 
me in the preparation of this book. First and foremost I 
thank my wife Francey. She has entered into the com- 
puter in the proper places and the proper order all of the 
almost impossible to follow changes and insertions that 
were haphazardly written in pencil and red pen over the 
typescript of the First Edition or written out in my hand 
as inserts on sheets of scrap paper, while at the same 
time correcting my spelling, grammar, and punctuation. 
Without her assistance, it would have taken me at least 
an additional year to finish the job less successfully. 
Daniel Louvard was kind enough to provide me with an 
office at the Institut Curie and access to its library in the 
years 1995-1996 and 2000-2001 so that I could pursue 
the project while away from La Jolla. I would also like to 
thank Heather Whirlow Cammarn, my copyeditor, who 
converted the manuscript into the style of the American 
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Chemical Society and tied up all of the many loose ends 
with acumen. I would again like to thank all of the 
reviewers of the First Edition because much of their 
assistance has been carried into the Second Edition. 
Russell Doolittle and Harvey Itano read large portions of 
the manuscript of the First Edition and provided excel- 
lent suggestions. Individual sections of the manuscript of 
the First Edition were reviewed critically by Frank 
Huennekens, Bruno Zimm, Charles Perrin, Steven 
Clarke, Ajit Varki, David Matthews, John Edsall, Cyrus 
Chothia, Arthur Lesk, David DeRosier, Nigel Unwin, 
Stephen Harrison, Fred Hartman, John Simon, George 
Fortes, Rachel Klevit, Ken Dill, Robert Baldwin, Howard 
Shachman, Dennis Haydon, and Guido Guidotti. I would 
like to thank all of the reviewers of the Second Edition. 
Individual sections of the manuscript of the Second 
Edition were reviewed by Larry Cummings, Martin 
Webb, Iain Nicholl, Jeffrey Carbeck, Lloyd Waxman, 
Partho Ghosh, Charles Perrin, Kenneth Walsh, Tama 
Hasson, Steven Clarke, Ajit Varki, Brian Matthews, the 
late Carl-Ivar Brandén, Dave Matthews, Ken Dill, V. 
Adrian Parsegian, Michael Page, Malcolm MacArthur, 
Patrick Argos, Stephen Harrison, William Trogler, Russell 
Doolittle, Henryk Eisenberg, Pierre Goloubinoff, Robert 
Fletterick, Georg Schulz, Michael Rossmann, Ron 
Milligan, Fred Hartman, Donald Engelman, David 
Johnson, Walter Englander, C. Nick Pace, Franz Schmid, 
Arshad Desai, Stephen White, and Douglas Rees. Each of 
them provided detailed criticism, many helpful com- 
ments, and reassurance. 


Jack Kyte 
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Stereo Drawings 


Almost all of the stereo drawings of crystallographic 
molecular models included in Structure in Protein 
Chemistry were produced by the program Molscript cre- 
ated by Per J. Kraulis. If you have the time and enjoy 
working on a computer, you should learn how to use the 
program, which is described at http://www.avatar.se/ 
molscript/doc/molscript.html. It is now standard prac- 
tice to publish drawings of crystallographic molecular 
model in this format. To appreciate the results of crystal- 
lographic studies, one must be able to view these images. 
Although a few individuals can view them effortlessly by 
crossing their eyes, the rest of us need a stereo viewer. 
The stereoviewer that I use and have recommended for 
my students is the PEAK™ Pocket Stereo Viewer with 2x 
magnification (124 mm legs). Suppliers of this viewer can 
be found using Google. It has been my experience that a 
student who has never viewed a stereo drawing before 


will usually complain that although everyone else can 
learn to use one of these viewers, he cannot. It is also my 
experience that everyone learns to use one. When I have 
put a question on an examination such Problem 4.5, 
where one is asked to write down the sequence of the 
protein by examining a drawing of a crystallographic 
molecular model that she has never seen before, every- 
one in the class gets at least 90% of the sequence correct, 
which would have been impossible unless everyone was 
able to see the image in stereo. It is essential that anyone 
interested in the structures of proteins learn to view 
drawings of crystallographic molecular models in stereo. 
The drawings in this text have been placed vertically 
rather than in their usual horizontal orientation and each 
has been placed on the outside edge of a page. This has 
been done to allow each image to be spread as flat as pos- 
sible for the best viewing. 


Taylor & Francis 
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NEWT 


There are hundreds of proteins discussed in the text of 
Structure in Protein Chemistry. Each of these proteins is 
present in many different species of organisms, but usu- 
ally the details that are being discussed are specific 
enough that the protein from only one of these many 
species is described, even though what is described 
would fit the protein from any one of these species. 
Furthermore, a particular protein from a particular 
species is always used in a particular experiment. The 
names of both the protein being discussed and the 
species from which it was derived are usually stated in 
the text. It turns out that protein chemists, because they 
realize that the same protein from different species of 
organisms is basically the same, don’t really care from 
what species of organism the protein comes, and a 
remarkably large collection of species of organisms are 
used as sources for proteins. Usually, in a particular 


investigation, one particular species is chosen as a source 
for a particular protein for a particular reason known 
only to the investigator. The practical result of these 
diverse choices is that the names of hundreds of species 
of organisms are used in this book. I have chosen to 
name each of them with the usual Latin names of their 
genus and species, without explaining what the species 
are because I wanted to make the point that it doesn’t 
matter where a protein comes from. The names 
Escherichia coli and Saccharomyces cervisiae and the 
adjectives murine, equine, bovine, canine, and human 
are probably already familiar to you but very few of the 
other names will be. Even though there is no need to 
know, if you would like to know to what the name of a 
genus and species refer, go to http://www.ebi.ac.uk/ 
newt/display, and enter the name of the species. 


ExPASy 


Hundreds of thousands of proteins from hundreds of dif- 
ferent species of organisms have been sequenced. The 
sequences of their amino acids are tabulated in large 
data banks. The most easily used of the data banks is the 
Swiss-Prot/TrEMBL at http://www.expasy.org/. You 


should become familiar with this site on the web, not 
only for the sequences it makes available but also for the 
free programs that are available at the site to analyze 
those sequences. 


Protein Data Bank 


The Protein Data Bank at http://betastaging.rcsb.org/ 
pbd/Welcome.do contains the atomic coordinates of 
most of the crystallographic molecular models that have 
been constructed. At the moment there are 36,000 sepa- 
rate molecular models entered in the data bank. You 
should look at some of the lists of the full coordinates to 
get a feeling for what such a file contains. Enter the name 
of a protein for which there is a stereo drawing in the text 
of this book, click on the name of one of the molecular 


models that are then listed, choose “Download Files”, 
and then choose “PDB File”. The atoms are listed by the 
name of the amino acid, the position of that amino acid 
in the sequence of the protein, and their locations within 
that amino acid by using the abbreviations given in 
Figure 4.14. The list is that of the x, y, and z coordinates 
of each atom in Angstroms. Each file constitutes the raw 
data on which the molscript program operates. 
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Chapter 1 


Purification 


The living world that teems around us, the world of 
species, individual organisms, organs, tissues, and cells, 
can be viewed as the manifestation of a vast fluid array of 
protein molecules, each appearing and disappearing in 
the proper place at the proper time. This array of protein 
molecules is the outcome of a long history. Each protein 
within the array is itself the product of evolution by nat- 
ural selection, which has had more than two billion years 
and much of the surface of the earth to explore, by 
random, irrational trial and error, strategies with which 
to accomplish the function of that protein. There are sev- 
eral consequences of this fact. First, chemical principles 
in addition to those of which we are aware have been dis- 
covered and exploited. Second, completely different 
chemical mechanisms often have been applied haphaz- 
ardly to achieve similar purposes. Third, there are puz- 
zling features that are inefficient, useless, or meaningless. 
Fourth, the result of this process does not resemble any- 
thing the human mind would have designed, even if it 
were aware of all of the available chemical strategies. One 
consequence of these facts is that argument by exclusion 
is useless because it cannot be assumed that the mecha- 
nism by which a biological problem was solved is only 
one or more of the mechanisms of which we can con- 
ceive. 

One fruitful approach in our attempt to understand 
life has been to study, individually or in small groups, the 
proteins that produce it to gain insight into the role of 
each one in the overall scheme. An argument could be 
made that a cell does seem to be no more than the sum 
of its parts and that a significant understanding of how it 
accomplishes its purpose can be gained by studying 
those parts individually. Because the proteins are the 
parts of a cell that perform almost all of the chemical and 
structural transformations that occur within it, they have 
attracted the most attention. 

The most dynamic region in a living organism is the 
cytoplasm of the cells or cell from which it is made. About 
20-30% of the total mass of cytoplasm is protein dis- 
solved in a solution the solvent of which is water. The 
cytoplasm is enclosed within a thin, fragile, continuous 
membrane. About 60-80% of the dry weight of this mem- 
brane is protein dissolved in a solution, the solvent of 
which is lipid. This membrane is surrounded and sup- 
ported by a tough protective integument of polysaccha- 
ride; polysaccharide and protein; or polysaccharide, 
lipid, and protein. Organelles, enclosed within their own 


membranes, are often scattered through the cytoplasm. 
In a eukaryotic cell the largest of these is the nucleus, 
containing most of the nucleic acid in the cell. 

The strategy that has been applied most frequently 
to the study of proteins is to identify a particular biologi- 
cal feature of a living organism and then purify the pro- 
tein or proteins responsible for it. Typically, when a 
complex, beautiful, intricately organized biological spec- 
imen, such as a tissue or a suspension of cells, is submit- 
ted to the first step in any purification procedure, it is 
immediately sundered beyond recognition and becomes 
a nondescript jumble of its organelles and broken frag- 
ments of its membranes and their integuments sus- 
pended in an aqueous solution of proteins, nucleic acids, 
metabolites, and salts. This event is referred to as 
homogenization. It is usually accompanied by the dilu- 
tion of the proteins in the initial specimen by addition of 
a buffered aqueous solution. Following the homogeniza- 
tion, insoluble fragments are removed by centrifugation 
to produce a clear solution, the protein concentration of 
which is 1-10%. This solution contains most of the pro- 
teins that were once the living cytoplasm of the speci- 
men. It is from this solution that particular proteins can 
be isolated. The purification ofa protein is the separation 
of that protein from all of the others in a homogenate. A 
particular protein must be purified before its molecular 
structure can be studied. 

Usually, the only interest that one has in a particu- 
lar protein arises from its participation in some process 
of biological importance. It might be an enzyme respon- 
sible for catalyzing a particular reaction; it might be a 
structural protein creating the macroscopic shape of the 
cell; it might be a protein that binds a hormone or neu- 
rotransmitter; or it might be a protein that binds to DNA 
and controls its transcription. To distinguish one protein 
from the others in a complex mixture, an assay for the 
protein of interest, based on its particular function, is 
required. 

The most widely used procedure for purifying pro- 
teins is chromatography. This technique separates mol- 
ecules of protein by differences in the rate at which they 
move along a cylinder of a porous solid phase as a liquid 
phase percolates through it. If the solid phase is properly 
chosen, each protein travels through the cylinder at a dif- 
ferent rate and each emerges in the solution coming out 
of the cylinder at a different time. In this way, one can be 
separated from the others. In order to distinguish the 
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protein of interest from the others as they emerge from 
the chromatographic column, the assay for that protein 
is used. As the protein becomes purified, the preparation 
displays greater and greater activity in the specific assay 
for a given amount of total protein. 

Once the protein has been purified, analytical 
methods must be used to demonstrate that only one pro- 
tein is the major component in the final preparation and 
that this protein is responsible for the biological function 
of interest. The analytical procedure most suited to this 
demonstration is electrophoresis. Electrophoresis sepa- 
rates proteins by both their charge and their shape, and 
if used with discontinuous stable boundaries, elec- 
trophoresis can have high resolution. 

Once a protein of known function has been purified 
to homogeneity, it can be crystallized. As in organic 
chemistry, crystallization is a way of harvesting a partic- 
ular substance in a highly purified form. Ideally, every 
protein that was purified would be crystallized and 
stored in this form, as are organic molecules. In this 
form, each suspension of crystals would represent a pure 
chemical compound. In practice, because crystals are 
often difficult to make and yields in crystallizations are 
poor, purified proteins are usually left in solution or pre- 
cipitated for storage. It is these solutions, precipitates, or 
suspensions of crystals that are the raw material for stud- 
ies of the structures and functions of the proteins they 
contain. The purpose of this chapter is to describe how a 
particular protein is purified from a complex mixture of 
proteins such as the homogenate of either a tissue or a 
suspension of cells. 

Adsorption to stationary phases and chromatogra- 
phy are the bases for both the purification of proteins 
and many of the assays used to identify particular pro- 
teins, so these processes will be considered first. 


Partition into Stationary Phases and 
Chromatography 


The goal of any procedure used to purify a particular sub- 
stance from a complex mixture is to separate that sub- 
stance from all of the other components in the mixture. 
When adsorption or chromatography is used for this pur- 
pose, differences in the preferences of solutes in a solu- 
tion for another phase are exploited. The simplest 
example of such a strategy is an affinity adsorbent. 
Suppose a small molecule that could be tightly bound by 
only one particular protein in a solution was covalently 
attached to a solid surface. By binding them specifically, 
this affinity adsorbent would collect molecules of only 
that one protein on its surface. The rest of the molecules 
of protein in the solution could be washed away, and the 
molecules of the desired protein could then be released. 
Unfortunately, such highly specific adsorbents are not 
usually available, so small differences in affinity among 
proteins or among other molecules in a solution for a 


separate phase are amplified by the process of chro- 
matography. 

When a chemical substance A, which will be 
referred to as the solute, is added to a vessel containing 
two immiscible phases and the system is allowed to 
come to equilibrium, the solute A will distribute between 
the two phases in a characteristic manner. The solute can 
be an inorganic ion, a small organic molecule, a protein, 
anucleic acid, a polysaccharide, or any other similar sub- 
stance. The two phases can be, for example, two immis- 
cible liquids, a liquid and a solid, or a gas and a liquid; the 
only requirement is that those two phases be brought 
into sufficient contact to permit the distribution of 
solute A between them to reach equilibrium and that 
they then be separated in some way that does not redis- 
tribute the solute. The simplest examples are a two- 
phase, solvent-solvent extraction or the suspension of 
some finely divided solid in a liquid followed by its 
removal from the liquid by filtration. 

After the equilibration and separation of the two 
phases, the moles of solute A in each of them can be 
determined. In the cases that are generally encountered, 
at least one of the phases is a fluid that can be freed 
entirely of the other phase. This fluid will be arbitrarily 
called the mobile phase. In the special case when a pro- 
tein is solute A, the mobile phase is invariably an aque- 
ous solution of moderate ionic strength buffered at a 
specific pH. In any situation, however, the molar con- 
centration of solute A in the mobile phase can be readily 
measured. The second phase, arbitrarily referred to as 
the stationary phase, can be an immiscible liquid, a 
solid, or a solid in which a liquid is entrapped. Because of 
the peculiarities of this stationary phase, the best way to 
express the concentration of solute A that has become 
physically associated with the stationary phase, [Als, is in 
moles (liter of bed)~!, where the volume of the bed is the 
volume filled by the stationary phase when it has 
settled.* 

Three general types of behavior’ have been 
observed in such a partition (Figure 1-1). The simplest 
behavior, type A, occurs when the concentration of the 
solute A in the stationary phase increases in direct pro- 
portion to its concentration in the mobile phase. This 
type of behavior is encountered in solvent-solvent 
extractions or in chromatography by molecular exclu- 
sion. In the latter example, it results from the fact that the 
stationary phase is nothing more than trapped, and 
thereby immobilized, mobile phase. Behavior of type B 
(Figure 1-1) is encountered when the stationary phase 
saturates with solute A. It results from the presence of 
only a finite number of sites on the stationary phase that 
are all equivalent in their individual affinities for solute A 


* Concentrations in moles per liter of bed are indicated by primed 
notation. Concentrations in moles per liter of stationary phase or 
moles per liter of mobile phase are in the usual unprimed notation. 
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Figure 1-1: Partition of a solute between two phases referred to as 
the stationary phase, S, and the mobile phase, M. The concentra- 
tion of solute A at equilibrium in the stationary phase in units 
of moles (liter of bed)", [Als, is presented as a function of its 
molar concentration in the mobile phase in units of moles (liter of 
fluid), [Aly for three types of behavior designated A, B, and C. 


and that are distributed over the stationary phase so that 
they do not interact with each other at saturation. 
Behavior of this type is encouraged by choosing micro- 
scopically uniform stationary phases. It is advantageous 
because at low concentrations of solute A the partition of 
solute closely approximates the direct proportionality of 
behavior of type A. Stationary phases showing this type 
of behavior are highly uniform ion-exchange resins or 
uniform, inert matrices to which molecules of a small 
organic compound displaying an affinity for solute A 
have been randomly and sparsely attached. In more het- 
erogeneous stationary phases, specific sites with which 
molecules of solute A associate are composed and dis- 
tributed in such a way that they have an array of different 
affinities. This means that the small number of sites with 
high affinities for solute A are occupied first, followed by 
those with lower and lower affinities sequentially. This 
produces behavior of type C (Figure 1-1), which is unpre- 
dictable and not uniform. Examples of a stationary phase 
of this type are crude hydroxylapatite or a matrix to 
which a polyclonal immunoglobulin has been attached. 
All three of these examples are to some extent ide- 
alized descriptions of actual behavior. The deviation 
from ideal behavior that is the most important to the 
present discussion, however, is that observed during the 
physical adsorption of molecules of protein onto the sur- 
faces of a solid phase.’ In this circumstance, although 
apparently ideal behavior of type B is observed at short 
times, the fraction of the adsorbed protein actually in 
equilibrium with the protein in solution decreases with 
time as the amount of irreversibly bound protein 
increases. It is believed that in such instances molecules 
of protein are denatured at the interface and that the 
interactions of these denatured molecules with the sur- 
face are much stronger than those of the undenatured 


molecules. This essentially irreversible adsorption of 
protein to the surfaces of a solid phase probably occurs 
in any chromatographic separation and is experienced as 
a less than theoretical yield of the protein collected from 
the chromatographic system. Often this loss is inconse- 
quential or tolerable. Because this process is a slow one,” 
a consequential loss of protein due to irreversible adsorp- 
tion upon chromatography can be decreased by decreas- 
ing the time during which the protein is in contact with 
the solid phase. Such loss of protein can also be avoided 
by choosing a solid phase such as agarose or polyacry- 
lamide that is less prone to producing interfacial denat- 
uration. It also helps to use the same solid phase 
repeatedly because the sites at which irreversible adsorp- 
tion occur become saturated over several uses. This 
strategy is inappropriate, however, if the sites at which 
irreversible adsorption occurs are the very sites upon 
which the desired reversible adsorption occurs, and 
repeated use gradually poisons the system. 

The earliest use of distributions of solutes between 
two phases was selective adsorption. Selective adsorp- 
tion is a technique in which conditions are sought that 
promote the almost complete confinement of the sub- 
stance of interest to one phase while other, unwanted 
substances distribute into the other phase and can thus 
be discarded. When a protein is being isolated in this 
way, the ionic strength, pH, temperature, and choice of 
stationary phase is varied until conditions are found that 
permit the protein of interest to distribute almost com- 
pletely into one of the two phases while the maximum 
amount of the other, undesired proteins distribute into 
the other. An example of this strategy is one of the steps 
in the purification of the protein fumarate hydratase.” 
Calcium phosphate gel was added to a crude mixture of 
proteins containing fumarate hydratase dissolved in 
0.1M sodium acetate, pH 5.2. All of the fumarate 
hydratase (>95%) associated tightly with the calcium 
phosphate gel. After the gel was washed, the adsorbed 
fumarate hydratase was then eluted in 97% yield with 5% 
(NH,)2SO, and 0.1 M sodium phosphate, pH 7.3, even 
though only 20% of the original protein remained in the 
final solution. 

Selective adsorption is a rather unsophisticated use 
of the distribution of a solute between a mobile and a sta- 
tionary phase. It can be remarkably improved upon by 
operating at concentrations of solutes low enough that 
[Als is directly proportional to the molar concentration of 
A in the mobile phase (behavior of type A) and by caus- 
ing the mobile phase to move slowly through or across 
the stationary phase. This process is known as chro- 
matography. 

Chromatography is the process by which solutes 
are separated from one another on the basis of differ- 
ences in the rate at which they pass across a bed of 
stationary phase through which a liquid mobile phase 
is continuously flowing. A chromatographic system is 
designed so that the mobile phase passes by the 
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stationary phase in such a way that the contact between 
the two phases is maximized and equilibration of the 
solute between them is encouraged. Examples are paper 
chromatography, in which the liquid mobile phase 
moves down the paper while flowing among the cellulose 
fibers that form the stationary phase; thin-layer chro- 
matography, in which the liquid mobile phase creeps up 
a thin layer of the solid, dry stationary phase drawn by 
the capillary force arising from its movement between 
finely divided particles; column chromatography, in 
which the fluid mobile phase percolates through a finely 
divided, solid stationary phase compacted in a cylinder; 
and gas-liquid chromatography, a type of column chro- 
matography in which a gas containing the solutes is 
passed through a finely divided solid phase coated with a 
liquid of low volatility. All of these are examples of zonal 
chromatography. 

Zonal chromatography is chromatography in which 
the mixture of solutes to be separated is introduced in a 
thin zone at one end of the bed of stationary phase and 
the mobile phase is then set in motion. The molecules of 
solute in the mixture meander through the system, drawn 
forward by the movement of the mobile phase but 
retarded by the stationary phase in which each spends a 
certain fraction of its time. The fraction of the time each 
solute spends in the stationary phase is determined by its 
affinity for the stationary phase, and this is determined by 
its bulk distribution behavior (Figure 1-1). Since the mol- 
ecules of each solute spend a different fraction of their 
time in the immobility of the stationary phase, each solute 
moves through the system at a different rate and the com- 
ponents of the mixture are isolated one from the other into 
separate zones, which are also referred to as peaks or 
bands. The separated solutes are collected either by divid- 
ing the stationary phase itself and extracting them, as in 
paper chromatography, thin-layer chromatography, or 
countercurrent distribution chromatography, or by con- 
tinuously collecting the mobile phase as it emerges at 
the opposite end of the bed of the chromatographic 
system, as in column chromatography or gas-liquid 
chromatography. Any visual display of the distribution 
over the field of the chromatographic system of one or 
more of the substances being separated is referred to as a 
chromatogram. 

The important properties of the chromatogram are 
the relative mobilities of the solutes, the widths of the 
peaks of the concentrations of the solutes at their half 
heights, and the resolution of those peaks one from the 
other. The relative mobility, R;,, of a particular solute A 
is either (1) the distance that the peak of its distribution 
has traveled through the system divided by the distance 
traveled by the mobile phase or (2) the total volume of 
the mobile phase in the bed of the system, referred to as 
the void volume, Vo, divided by the total volume that has 
passed through the system before the peak of the distri- 
bution of solute A emerges, referred to as its elution 
volume, V, a. Definitions 1 and 2 are two different ways to 


define the same parameter. The width of the distribution 
of solute A at half height, w, a is the width, in units of 
distance for definition 1 or volume for definition 2, 
between the two points at which the concentration of 
solute A is half its maximum concentration at the peak. 
The resolution, R,,, between two solutes is a measure of 
the completeness with which they are separated, a prop- 
erty that increases as the difference in their relative 
mobility increases and decreases as their widths at half 
height increase. The larger the differences in the various 
Ro: and the smaller the various wy; the more successful 
will be the separation of the different solutes i. 
Expressions for Rea, Wy a and Rag as functions of param- 
eters that can be manipulated are of value in the under- 
standing and design of chromatographic separations. 

There are two approaches to describing the 
phenomenon of chromatography in theoretical terms.* 
It can be treated as the continuous process that it is, and 
differential equations can be formulated to describe the 
differential changes in solute positions and concentra- 
tions with time. These differential equations, however, 
do not have simple solutions, nor do they lead to an intu- 
itive understanding of the process. The alternative 
approach is based on the concept of the theoretical plate, 
which was developed originally to describe the separa- 
tion performed by a fractional distillation column "P 
Although this is a discontinuous model for a continuous 
process, the treatment is formulated in terms of an easily 
understood mechanism and does provide, in at least one 
case, that of countercurrent distribution chromatogra- 
phy, an exact solution to the problem. Martin and Synge’ 
were the first to apply this model to the process of chro- 
matography. 

Suppose that a chromatographic separation always 
operates at concentrations of solute A such that the 
amount associated with the stationary phase and the 
mobile phase in the chromatographic system is a linear 
function of its concentration in the mobile phase (behav- 
ior of type A, Figure 1-1). If so, at equilibrium 


(Al = æa [Aly (1-1) 


where [Aly is the concentration of solute A in the mobile 
phase in units of moles (liter of bed)~', where the volume 
of the bed, Vz, is the volume filled by the stationary and 
mobile phases together as they are packed into the chro- 
matographic system; and where o, is a partition coeffi- 
cient. The units for [Als are, as defined earlier, moles (liter 
of bed)’. 

The bed of the chromatographic system is formally 
divided into a series of equivalent theoretical plates. A set 
of theoretical plates is a set of contiguous compartments 
of equal volume formed by a set of evenly spaced planes 
passing through the bed normal to the direction in which 
the mobile phase flows. The height equivalent to a the- 
oretical plate, h, is the distance the mobile phase must 
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move, at the rate of normal flow, past the stationary 
phase until the concentration of solute in the fluid 
emerging from the theoretical plate is equal to the con- 
centration the solute would have had if the fluid entering 
the theoretical plate had come into equilibrium with the 
stationary phase that fills the theoretical plate. For exam- 
ple, if the fluid entering the upstream boundary of the 
theoretical plate had a concentration of solute A equal to 
[A]Ment and the stationary phase already had solute A 
immobilized within it at a concentration of [A] im, the 
formal downstream boundary of the theoretical plate 
would occur at the point where the concentration of the 
solute in the mobile phase, Abu had reached a value 


[A] ‘Ment + [Al’s i 
LAT yaa = M,ent S,im (1-2) 
1+ a's 


where all concentrations are expressed in moles (liter of 
bed)". 

With this definition, the continuous process of 
zonal chromatography is equivalent to the following dis- 
continuous sequence of events. A number of moles of 
solute A equal to mor, is added to the first theoretical 
plate and allowed to come to equilibrium between the 
stationary and mobile phases. The entire mobile phase of 
each plate in the system is then moved to its neighbor 
downstream, and mobile phase containing no solute is 
added to the first plate. After the new situation is allowed 
to come to equilibrium, the same transfers of mobile 
phases are made. The cycle of equilibrium and transfer is 
repeated n times. A machine? that performs chromatog- 
raphy by countercurrent distribution mechanically 
proceeds through this exact sequence of transfers. The 
theoretical plates in the countercurrent machine are 
individual glass vials, the mobile phases are equal 
volumes of an aqueous solution, the stationary phases 
are equal volumes of an immiscible organic solvent, and 
the steps of equilibration and transfer are discrete. 
Normally, however, this sequence of events is a theoreti- 
cal simplification only formally equivalent to what 
actually happens. 

At the conclusion of nsteps, the downstream 
boundary of the mobile phase will have moved through 
the chromatographic system a distance dy where 


du = nh (1-3) 


the number of steps times the height of a theoretical 
plate. It can be shown”” that after n steps the mean of the 
distribution of solute A, and hence the peak of its distri- 
bution, will have moved a distance d, where 


ob 
1+ a’, 


dy (1-4) 


while the width of its distribution at half height will be 


hy/n [,/8 (n2) Däi 
Wy A = 7 (1-5) 
` 1+ a's 


In thin-layer chromatography and paper chro- 
matography, the flow of mobile phase up the thin layer 
or down the paper is stopped before the downstream 
boundary of the mobile phase reaches the end of the sta- 
tionary phase. The number of steps n that have occurred 
is defined by the fact that the boundary has moved a 
distance nh from the origin at which the solutes were 
applied. If the relative mobility of solute A is defined as 
the distance solute A has moved, dą, divided by the dis- 
tance the boundary has moved, dy, then 


nh — ol 
nh(l+a’,) 1+ a's 


(1-6) 


By combining Equations 1-3, 1-5, and 1-6 


wy, = duh |,/8(In2) Rea D pl 0-2 


The distance that solute A has moved through a 
given chromatographic system, d,, is directly propor- 
tional to the number of theoretical plates through which 
the mobile phase has moved (Equation 1-4), but the 
width of its distribution, wy, is proportional to the 
square root of the number of theoretical plates through 
which the mobile phase has moved (Equation 1-5). As a 
result, as the chromatography progresses, solutes sepa- 
rate from each other more rapidly than they spread, and 
it is this property that permits chromatography to per- 
form separations. This property can be quantified as the 
resolution between any two solutes. 

If the resolution, Raz, between the distribution of 
solute A and the distribution of solute B is defined as 


2 | dy = de | 
Wen ED 
YA KOHN 


then by assuming that h is the same for both solutes and 
combining Equations 1-3 through 1-5 


SEENEN 
na 

h Iesele, Dad lan (1+ 0) 

(1-9) 


Because œh and op are fixed properties of the stationary 
phase, the solvent, and the solutes, this equation demon- 
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strates that resolution is increased either by decreasing 
the height of a theoretical plate or by running the chro- 
matography over a greater distance. 

In column chromatography the effluent emerging 
from the end of the column is collected and the concen- 
tration of solute A in this effluent is monitored as a func- 
tion of the total volume that has emerged since the 
chromatogram was begun. If the column of stationary 
phase contains p theoretical plates, the effluent collected 
and monitored is, by definition, the mobile phase enter- 
ing plate p + 1. As mobile phase emerges from the end of 
the system, the concentration of solute A that it contains 
increases, reaches a maximum, and then declines. This 
results from the approach of the peak of the distribution 
of solute A to plate p + 1, its arrival at plate p+ 1, and its 
passage beyond plate p + 1. The volume at which the 
maximum passes through plate p + 1 is the elution 
volume of solute A, Vea. It corresponds to the volume of 
mobile phase that must pass through the system to bring 
the maximum of the distribution of soluteA into 
plate p+ 1. Because it takes p steps for a volume equal to 
the void volume V, to emerge from the column but the 
peak of the distribution of solute A will have entered only 
theoretical plate p(1 + a4)" after p steps, the peak of the 
distribution of solute A will enter plate p +1 only after a 
volume equal to Mall + ol has passed through the 
system. It follows that 


(1-10) 


VeA = v (1 + a) 
and 


(1-11) 


This is the fundamental equation governing column 
chromatography. It connects the volume at which the 
solute A emerges from the end of the chromatographic 
column with its bulk partition coefficient for the material 
composing the stationary phase. The relationship 
between the relative mobility R;, and the partition coef- 
ficient o, is identical to that governing thin-layer chro- 
matography and paper chromatography (Equation 1-6). 
This is reassuring because it is reasonable that the same 
process occurs in all types of chromatography. The valid- 
ity of this equation was verified experimentally by Martin 
and Synge.’ 

The width of the peak of concentration at half 
height, in units of eluted volume, can be shown to be a 
function of the number of theoretical plates:*"'%"! 


8(In2) 
aa = ——=— V 


ip 


(1-12) 


If the resolution between the distribution of solute A and 
the distribution of solute B is defined as 


p= 2Vea= Veal Gees 
SE Wy a + Wa B 


then 


7 4/ 8(1n2) | 2|a’, - a’s| 


Rag = 1-14 
= Jp SECH Gen 


If the solvent, ionic strength, temperature, and pH of the 
mobile phase and the volume and chemical structure of 
the stationary phase remain the same so that the values 
of o are unchanged, the resolution of the separation can 
be improved by increasing the number of theoretical 
plates, p, that the column contains. The most obvious 
way to accomplish this is to increase the length of the 
chromatographic column, but this can become both 
cumbersome and expensive. 

Because the height of a theoretical plate, h, is 
defined as the distance of passage required for equilib- 
rium to be reached, h decreases and p increases as the 
flow rate of the chromatographic column is decreased, at 
least until diffusion between the plates becomes a signif- 
icant factor. In most cases, however, diffusion is severely 
hindered by the structure of the stationary phase itself 
and almost never becomes important, and the slower the 
flow, the better the resolution. This is particularly impor- 
tant in the chromatography of proteins, especially when 
they are unfolded, because their slow rates of diffusion 
significantly decrease rates of equilibration with the sta- 
tionary phase. 

The height of the theoretical plate decreases as the 
diameter of the particles in a solid stationary phase 
decreases,’ and it is advantageous to use particles of solid 
phase that are as small as possible. The small size of the 
particles increases the surface area available for equili- 
bration and decreases the distances over which the 
solute molecules must diffuse. The realization of this fact 
has led to the recent development of the high-pressure 
liquid chromatography foreseen by Martin and Synge.’ 
In such systems, the high pressure is inconsequential to 
the process of separation but is required to force the 
liquid mobile phase through the small, finely divided 
solid particles of the stationary phase at a realistic rate. 
The particles themselves are spherical in shape and of 
uniform diameter to promote uniform flow of as rapid a 
rate as possible over the bed. Because the smaller parti- 
cles of the solid phase decrease the height of a theoreti- 
cal plate, more theoretical plates can exist in a given 
length of bed. This advantage can be exploited either to 
increase the resolution or to decrease the length of the 
chromatographic column or both. 
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Because high pressures are used to increase the rate 
of flow through a shorter column, the major advantage of 
high-pressure liquid chromatography is the speed with 
which the chromatograms can be run. For example, 
when peptides are separated on chromatography by 
cation exchange with sulfonated polystyrene,” at low 
pressure (<500 psi), the chromatography takes about 
25h; when peptides are separated by reverse-phase 
adsorption chromatography,’ at high pressure 
(>1000 psi), the chromatography takes only 1h, even 
though the resolution in each case is about the same. 
With reverse-phase adsorption chromatography, the sol- 
vents used are also more transparent to ultraviolet light, 
so peptides can be followed simply by their absorbance 
with a continuous-flow spectrophotometer. 

Improvements in the size, uniformity, and rigidity 
of the particles of the stationary phase have permitted 
similar increases in the rate at which chromatography of 
proteins can be performed. These developments are 
referred to commercially as fast protein liquid chro- 
matography. In both high-pressure liquid chromatogra- 
phy and fast protein liquid chromatography, the 
principles remain the same as before, often the solid 
phases remain the same as before, and the technological 
improvements of the original techniques are based on 
previously noted predictions of the original theory. 

The discontinuous model presented here for chro- 
matography has been developed for regions of the parti- 
tion curves (Figure 1-1) where solute A distributes with a 
constant partition coefficient, œh. It turns out that the 
most usual deviation from such ideal behavior is for the 
stationary phase to display saturation (curves B and C, 
Figure 1-1). The more prominent this behavior becomes, 
the poorer the resolution of the chromatogram 
becomes.’ As a rule, uniform stationary phases of high 
capacity, by promoting the linearity of the partition func- 
tion, provide the highest resolution. 

The fact that, unless the number of theoretical 
plates is increased, peak height decreases in almost 
inverse proportion to ox, (Equations 1-11 and 1-12) pre- 
cludes the use of conditions where the solute has a high 
affinity for the stationary phase. Usually, conditions 
such as solvent, temperature, ionic strength, and pH of 
the mobile phase and the chemical structure of the sta- 
tionary phase are manipulated to bring the values of o, 
for the solutes to be separated into a useful range, usually 
between 1 and 10. A variation in one of these properties 
of the mobile phase, however, can also be incorporated 
into the chromatography itself. 

To this point, only isocratic zonal chromatography 
has been described. Isocratic zonal chromatography is 
chromatography in which the mobile phase introduced 
continuously into the chromatographic system remains 
of constant composition. It is possible, however, to vary 
continuously and monotonically the composition of the 
mobile phase entering a column. This systematic varia- 
tion produces a gradient of one or more properties of the 


mobile phase. For example, the ionic strength of the 
entering mobile phase can be increased continuously 
over time so that it is a linear function of the volume 
introduced into the system. Mechanical devices are 
available to produce linear gradients or gradients that are 
exponential or logarithmic or some other function of the 
volume by mixing two or more solutions that differ in the 
property to be varied. When a gradient of pH is required, 
the situation becomes somewhat more complicated 
because the pH of a solution is usually controlled with a 
buffer. Not only is the pH a logarithmic function of the 
concentrations of the conjugate acid and base of the 
buffer, but changing the concentrations of conjugate 
acid and base often affects the ionic strength. There is no 
requirement, however, that the gradient be some 
particular function of a particular property; the only 
requirement is that the property be varied continuously 
and monotonically. 

The method of gradient chromatography is an 
important tool because it permits the partition coeffi- 
cient of solute A, ox, to be decreased during the chro- 
matographic run. This often is essential because if the 
partition coefficient for a particular solute is too large, it 
emerges from the system with such a large elution 
volume, V,,, that the width of its band is unacceptably 
large. To produce satisfactory chromatography, the par- 
tition coefficient must be less than 10 in most situations, 
but frequently the values of the partition coefficient of 
solutes in a complicated mixture can spread over a large 
range for one particular mobile phase of constant com- 
position. By using a gradient formulated so that all of the 
partition coefficients for the solutes decrease continu- 
ously, even those solutes with the highest affinity for the 
stationary phase eventually have low enough partition 
coefficients to emerge from the system within a reason- 
able time. Usually, a gradient of ionic strength, cosolvent, 
or pH is employed. It is constructed in such a way that 
the chosen property continuously changes in a direction 
that will cause the solutes to have smaller and smaller 
affinities for the stationary phase and elute earlier than 
they would under isocratic conditions. For example, if a 
solute is being adsorbed to a nonpolar stationary phase, 
a gradient that increases in the concentration of a misci- 
ble nonpolar solvent in water is used to decrease gradu- 
ally the affinity of the solute for the stationary phase. 

The stationary phase in a chromatographic system 
is the chromatographic medium. The solid matrix com- 
posing a chromatographic medium is almost always a 
polymer. Both natural polymers, for example, cellulose, 
and unnatural polymers, for example, polymers of poly- 
styrene cross-linked with divinylbenzene, are used. The 
basic polymer is often cross-linked appropriately to 
increase its rigidity and manufactured in the form of 
small spherical beads of uniform size to improve flow 
rates. For chromatography of small molecules such as 
metabolites or peptides, beads of polystyrene or silica gel 
are used; for chromatography of proteins, cellulose or 
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beads of dextran, allyldextran, polyethers, agarose, or 
polymethacrylate are used.* Each of these intentionally 
inert polymeric matrices is then modified chemically. 
The type of modification performed determines the 
molecular property used by the chromatographic system 
to separate the solutes. 

Media for chromatography by adsorption are solid 
phases with which the solutes physically associate by 
noncovalent forces. Certain amorphous or heteroge- 
neous solids such as hydroxylapatite and silica gel have 
long been used as chromatographic media for chro- 
matography by adsorption. Amorphous hydroxylapatite 
has been used extensively in protein purification. 
Unfortunately, it is prone to significant irreversible 
adsorption,” and it is heterogeneous and saturates read- 
ily, which causes it to have nonlinear distribution behav- 
ior. All of these properties limit its resolution. Although it 
separates nonpolar solutes successfully, silica gel has the 
unfortunate property of strongly adsorbing hydrogen- 
bonding solutes, which precludes its use with most bio- 
logical substances. Reverse-phase chromatographic 
media, however, have found wide use in protein chem- 
istry in the separation of small molecules such as pep- 
tides and metabolites. Such media are composed of 
spherical beads of silica gel that have been heavily alky- 
lated with hydrocarbons of uniform length, for example, 
octadecyl or octyl groups. This blocks the sites of hydro- 
gen bonding and creates an apolar surface on the beads 
that adsorbs apolar functional groups on the otherwise 
polar solutes. Such a chromatographic medium, how- 
ever, shows little affinity for completely polar solutes 
unless a significant portion of the silica gel has lost its 
apolar coating. 

Beaded, cross-linked dextran, agarose, or poly- 
methacrylate are covalently modified to produce chro- 
matographic media for the chromatography of proteins 
by adsorption. The functional groups that are attached to 
these hydrophilic matrices during the covalent modifica- 
tion are hydrophobic groups such as phenyl, methyl, 
butyl, propyl, or tert-butyl groups. These hydrophobic 
groups associate directly with hydrophobic groups on 
the surface of a molecule of protein that are the side 
chains of the amino acids valine, leucine, isoleucine, and 
phenylalanine. 

In media for chromatography by adsorption, the 
affinity of the molecules of the solute for the stationary 
phase arises from their direct physical attachment to the 
molecular surface of the stationary phase. These tran- 
sient associations are noncovalent in nature and can be 
considered as hydrophobic contacts or hydrogen bond- 
ing—designations that imply direct molecular contact 


* The commercial forms of these beaded, cross-linked polymers 
each have their own uninformative names, but it is possible to 
learn their compositions if one is perseverant. Although one or the 
other chromatographic medium may have the same composition, 
each manufacturer claims unique benefits for his product. 


between solid phase and solute. It is this molecular con- 
tact that distinguishes chromatography by adsorption 
from chromatography by ion exchange. 

Media for chromatography by ion exchange are 
solids formed from all of the usual neutral polymers to 
which charged organic functional groups have been 
covalently attached (Figure 1-2). Anion-exchange 
media, or basic media, are solid phases to which func- 
tional groups of positive charge at neutral pH have been 
covalently attached, and cation-exchange media, or 
acidic media, are solid phases to which functional groups 
of negative charge at neutral pH have been attached. A 
distinction can be made between weakly basic or acidic 
and strongly basic or acidic ion-exchange media based 
on whether the fixed charges can or cannot be neutral- 
ized, respectively, by variation of the pH within the 
ranges normally employed for chromatography. This is 
an important distinction because the density of charge, 
and hence the capacity of the medium, can be changed 
by changing the pH when weakly basic or weakly acidic 
ion-exchange media are used but not when strongly 
basic or strongly acidic ion-exchange media are used. 
Examples of weakly basic functional groups are tertiary 
amines such as those on [2-(diethylamino)ethyljcellu- 
lose (DEAE-cellulose); examples of strongly basic func- 
tional groups are quarternary ammonium cations such 
as those on N,N-diethyl-N-(2-hydroxypropyl)ammo- 
nioethyl agarose (QAE agarose) or trimethylammo- 
nioethyl polymethacrylate; examples of weakly acidic 
functional groups are carboxylates, such as those on car- 
boxymethyl cellulose, or phosphates, such as those on 
phosphocellulose; and examples of strongly acidic func- 
tional groups are sulfonates, such as those on sulfonated 
polystyrene (Figure 1-2) or sulfonated polymethacrylate. 

The fixed charges on the stationary phase are 
responsible for the tendency of ionic solutes of an oppo- 
site charge to associate with it. A counterion is a mobile 
ion that is dissolved in the surrounding solution and has 
a charge opposite in sign to the fixed charges on the sta- 
tionary phase; a co-ion is a mobile ion that is dissolved in 
the surrounding solution and has a charge of like sign to 
the fixed charges on the stationary phase. Solutes con- 
taining simple univalent ionic functional groups do not 
form physical contacts with the isolated univalent fixed 
charges of opposite sign that are attached to the station- 
ary phase when chromatography by ion exchange is per- 
formed in aqueous solution. Rather, such charged 
solutes (for example, nucleotides, amino acids, or pro- 
teins) can be considered to be trapped as mobile counte- 
rions surrounding the covalently fixed charges in an 
ionic double layer.’* The two layers in an ionic double 
layer are a layer of covalently fixed charges on the surface 
of the polymer forming the stationary phase and a layer 
of solution, adjacent to that surface, that is enriched in 
counterions and depleted of co-ions. The molecular sur- 
face of the layer of fixed charge is considered to be the 
boundary between the layers of the double layer. 
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Figure 1-2: Covalent modifi- 
cations that produce media for 
ion exchange. Media derived 
from polystyrene and cellulose 
are presented. 
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The enrichment of counterions, which in this case 
are the solutes being separated, in the layer of solution 
results from the requirement for maintaining elec- 
troneutrality. The layer of solution contains solutes of 
both net positive and net negative charge but has an 
excess of solutes of net charge opposite to the charge of 
the functional groups in the layer of covalently fixed 
charges and is depleted in solutes of opposite charge. 
The layer of covalently fixed charges is usually consid- 
ered to be localized in a geometric surface representing 
the molecular surface of the polymer, and the layer of 
solution enriched in the respective counterions is con- 
sidered to have the properties of a space charge extend- 
ing into the surrounding solvent.” 

The reason that the diffuse space charge extends a 
significant distance into the solution beyond this bound- 
ary is that the positive and negative charges in the solu- 
tion are on mobile, dissolved cations and anions, and the 
enthalpic tendency of the counterions to gather at the 
charged surface of the boundary and the tendency of the 
co-ions to avoid the charged surface of the boundary is 
counterbalanced by the entropic tendency for each of 
them to diffuse randomly throughout the surrounding 


solution. Because the imbalance in charge that defines 
the ionic double layer falls off exponentially, the layer of 
solution in which the imbalance in charge occurs theo- 
retically has no outer boundary. It is, however, arbitrarily 
assigned a thickness that is approximately that distance, 
from the surface of fixed charges, at which the space 
charge has decreased by a factor of exp (-1). Under the 
normal conditions of chromatography, the thickness of 
the layer of solution in the double layer would be less than 
10 nm." It can be assumed that the boundary that sepa- 
rates the stationary phase from the mobile phase during 
the chromatography, namely the outside surface of the 
bead, lies at a much greater distance than this from the 
molecular surface of the charged strands of polymer 
within the bead because flow occurs around beads of 
dimensions at least a thousand times larger. Therefore, 
the entire ionic double layer must be within the chro- 
matographic stationary phase. 

If this assumption is made, the distribution of 
counterions between the stationary phase and the 
mobile phase becomes formally equivalent to the distri- 
bution of permeant counterions across a permeable 
membrane when a charged, impermeant macromole- 
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cule is present on only one side of the membrane. In the 
case of chromatography by ion exchange, the charged 
polymer of the bead is formally equivalent to the 
trapped, charged macromolecule. If this is the case, the 
sum of the fixed charges and the dissolved mobile 
charges of the same sign in the stationary phase must 
equal the sum of the dissolved mobile charges of the 
opposite sign in the stationary phase. It follows that 
the concentration within the stationary phase of any 
solute of charge opposite to the fixed charges must 
always be greater than its concentration within the 
mobile phase, and it is this bias that can produce signifi- 
cant values of a}. This bias can be treated by the Donnan 
formalism.” 

Consider the situation of an anion-exchange 
medium of univalent fixed positive charges, N* [an exam- 
ple would be _ [N,N-diethyl-N-(2-hydroxypropyl) 
aminoethyl] cellulose, Figure 1-2], and a univalent 
anionic solute, A (an example would be AMP’), in the 
presence of a dissolved univalent salt, K'CI, referred to 
as the electrolyte. Assume that the original stationary 
phase was the chloride salt of N* and that the solute 
before it was added to the stationary phase was the 
potassium salt of A”. All concentrations are expressed in 
terms of moles (liter of phase)", hence the unprimed 
values. From the requirement for electroneutrality 


[Kt]; + [N+]; [ai + [A]; (1-15) 


[K*]lu = (Cl hye [A ]m (1-16) 


where the subscripts refer to the stationary and mobile 
phases. Since the electrolytes are at equilibrium within 
the theoretical plate 


Lem EE = ab [cr], (1-17) 


[K*]u [A ]y = [K*]s [A]; (1-18) 


In the particular circumstance where the concentration 
of solute A is significantly less than the concentration of 
CI” so that [A] becomes negligible in both Equations 
1-15 and 1-16 and the concentration of fixed charges in 
the stationary phase, [N*]s, is so large that [K*]; in 
Equation 1-15 becomes negligible, then 


Xp = = (1-19) 


where o, is defined somewhat differently from the par- 
tition coefficient described so far. Instead of units of con- 
centration in moles (liter of bed), the units of 


concentration are moles (liter of stationary phase) and 
moles (liter of mobile phase)’. 

Equation 1-19 predicts that the partition coeffi- 
cient, Ga, for solute A should be inversely proportional to 
the concentration of K* in the mobile phase. Because the 
internal volumes of the stationary phases in chromatog- 
raphy by ion exchange are fairly small and the capacities 
of most media are large even in terms of equivalents (liter 
of bed)”, the situation in which [K*]s = [N‘]s is probably 
rarely approached, and Equation 1-19 should govern 
most concentrations of salt employed. The effect of 
adding a univalent salt to the mobile phase is to decrease 
the value of the partition coefficient for the anion A 
between the cationic stationary phase and the aqueous 
mobile phase. In this way, the value of o, can be adjusted 
by varying the concentration of electrolyte to optimize an 
isocratic separation, or a gradient of the electrolyte can 
be used to vary œ,- continuously. If the concentration of 
electrolyte is low, o will be large and the mobility of A” 
will be negligible. Therefore, a charged solute can be 
gathered tightly at the origin of the chromatographic 
system from a large volume of a dilute solution at low 
ionic strength, and chromatography can then be initi- 
ated by increasing the concentration of the electrolyte. 

In a weakly basic or acidic ion-exchange medium, 
the titration of the charges that occurs upon adding acid 
or base, respectively, occurs over a broad range of pH 
because of electrostatic repulsion among the fixed 
cations or anions. This permits the density of charge on 
the medium ([N*]s or [O"]s) to be continuously decreased 
by incorporating a gradient of pH into the entering 
mobile phase. For example, if the stationary phase has 
fixed, protonated tertiary ammonium cations, a gradient 
of increasing pH would decrease [R3NH*]; as it pro- 
gresses. The decrease in the density of charge ([N*]s or 
[O ]s) produces a decrease in o or G+ (Equation 1-19), 
causing the solutes to emerge sooner than they would 
under isocratic conditions. When the solutes themselves 
are weak acids or bases, however, their ionization may 
also vary as the gradient of pH progresses, but in the 
opposite sense to the stationary phase; their effective 
charge will be increasing as the gradient progresses. 

There is no question that Equation 1-19, although 
intuitively informative, does not describe real ion- 
exchange processes. At face value it predicts that a,- 
should be a function only of the charge density on the 
stationary phase and the concentration of electrolyte, 
and this is often not the case. Even simple solutes upon 
ion exchange display affinities for the supporting poly- 
meric matrix or the functional groups on the fixed 
charges that sometimes differ greatly from this expecta- 
tion. The reason for these deviations is almost certainly 
due to the fact that solutes, brought to high concentra- 
tion within the double layer by ion exchange, adsorb 
physically to these constituents, and as a result chro- 
matography by adsorption is superimposed upon the 
basic process of chromatography by ion exchange. The 
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clearest example of this is found in the separation of 
amino acids on sulfonated polystyrene (Figure 1-3).'°!’ 
Even though the solutes in the series alanine, valine, 
leucine, and phenylalanine have almost identical acid 
dissociation constants, and hence ionic charge, they are 
cleanly separated. There is little doubt that the separa- 
tion observed in this series is due to chromatography by 
adsorption performed by the styrene-divinylbenzene 
copolymer of the matrix." An ion-exchange medium can 
also participate in adsorbing simple cations or anions by 
chelation, such as occurs in the binding of alkali metal 
cations to polygalacturonic acid.'? 

A molecule of protein is a macromolecular poly- 
electrolyte, the effective charge of which is a function of 
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Figure 1-3: Separation of amino acids on chromatography by 
cation exchange.'® A mixture of amino acids in the ratios typical of 
those found in a protein was submitted to chromatography on a 
column (0.90 cm x 100 cm) of sulfonated polystyrene (Figure 1-2) 
in the sodium form. The values of the pH and temperatures of the 
buffered mobile phases are noted below the horizontal axes, which 
register the volume of the mobile phase that has passed through 
the column (in centimeters’) since initiation of the chromatogra- 
phy. Changes from one mobile phase to the next were made dis- 
continuously at the times noted. Individual fractions of the effluent 
emerging from the bottom of the column were collected and 
assayed for their concentration of amino acid (millimolar). The rel- 
ative mobility, Rp of each amino acid in the initial isocratic separa- 
tion at pH 3.41 would be the void volume of the column divided by 
the volume at which its peak of concentration emerged from the 
column. The width at half height, w,,, of each peak is its width in 
milliliters at a level of concentration half that of the concentration 
at its peak. Reprinted with permission from ref 16. Copyright 1951 
Journal of Biological Chemistry. 


pH and varies over a wide range. If the pH is changed, the 
partition coefficients of proteins upon ion exchange vary, 
and gradients of pH as well as gradients of ionic strength 
are used in their chromatography. In the case of poly- 
electrolytes of this type, interactions between the solute 
and the stationary phase may also lead to direct adsorp- 
tion. Although simple univalent ions when they are at 
normal concentrations almost certainly do not physi- 
cally associate with each other in aqueous solution, poly- 
electrolytes of opposite charge, such as proteins and 
ion-exchange media, sometimes do. This results from a 
cooperative association of the opposite charges on the 
two polymers that arises from the fact that the charges on 
the ion-exchange medium are covalently fixed and those 
of the opposite sign on the protein are also covalently 
fixed. It is always possible that there is a population of 
sites on the ion-exchange medium where the distribu- 
tion of charge complements the distribution of charge on 
the protein, a possibility that will produce physical 
adsorption. This, however, is probably a rare phenome- 
non; most of the time the molecules of protein are simply 
trapped inside the ion-exchange medium as mobile 
counterions in the ionic double layer. 

Media for chromatography by molecular exclu- 
sion* separate molecules on the basis of differences in 
their size and shape. The beaded solids used as station- 
ary phases are tangled webs of hydrophilic, linear poly- 
mers—dextran, agarose, polyacrylamide, polyether, or 
polymethacrylate—cross-linked among themselves 
randomly along their length. These matrices can be 
produced in two ways. First, polysaccharides such as 
agarose and dextran spontaneously imbibe water and 
swell when the dry solid is exposed to an aqueous solu- 
tion. The degree to which the linear polymers are cross- 
linked among themselves determines how much water 
they will imbibe at saturation. This is designated as their 
water regain, W,, in milliliters (gram of polysaccharide)’. 
This in turn determines the fraction of the volume of the 
stationary phase occupied by solid polymer, fpoiy: 


V D 

poly poly 
f, l = = Z (1-20) 
poy Vio + Vooly W, + Vooly 


where Voy is the volume occupied by polysaccharide, 
Viz,0 is the volume occupied by water, and Dech is the 
partial specific volume of the polysaccharide in milli- 
liters eram `. Second, polyacrylamide does not swell 
readily but can be polymerized from acrylamide 
monomers and a small amount of the cross-linker 
N,N’-methylenebis(acrylamide), both dissolved at a 
certain concentration in an aqueous solution. This pro- 
duces a rigid gel that can be fragmented. The majority of 


* This method is also called size exclusion, gel filtration, and gel 
permeation. 
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the volume inside the beads of any of these stationary 
phases for chromatography by molecular exclusion is 
occupied by water. When water is within the tangled web 
of the bead, however, it is no longer mobile but station- 
ary. The mobile phase percolates around the beads and 
flow occurs only in the interstices among the beads. The 
void volume, Vo, is the volume of this space outside of the 
beads. 

The larger the molecule of solute, the less of the 
open space inside the beads of the stationary phase is 
available to it. If solute A is too large, it cannot enter the 
beads at all, and its peak emerges from the system at the 
void volume, Vo. Therefore, the elution position on the 
chromatogram of the completely excluded molecules 
marks the position of Vo. A small molecule (in theory, 
water itself or something equivalent to it) can enter the 
entire open space in each bead, and its elution position 
marks V; the included volume. Unlike with most other 
chromatographic separations, there is an end to a molec- 
ular exclusion chromatogram because no solute can see 
a larger volume than Vj. The only useful separation that 
occurs in such a system is of those solutes that emerge 
between Vo and V, because all solutes larger than a cer- 
tain size travel together at V and all solutes smaller than 
a certain size travel together at V; (Figure 1-4). Between 
Va and V; on the chromatogram, the larger solutes are the 
first to emerge. 

Because the fluid contained within the beads is 
identical in composition to the mobile phase percolating 
around the beads and because a polymer that theoreti- 
cally has no affinity for the solutes being separated has 
been chosen, the partition coefficient for solute A, ox 
between stationary and mobile phases is the ratio of the 
volume within the stationary phase that solute A can 
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Figure 1-4: Chromatography by molecular exclusion. As in Figure 
1-3, the concentration of solute in the fluid emerging from the 
chromatographic system is plotted as a function of the volume that 
has emerged. All molecules larger than a certain size move at the 
void volume, Vp; all molecules below a certain size travel at the 
included volume, V; and solutes A and B travel at their elution vol- 
umes, V, a and V, p, respectively, and are separated from each other 
because a molecule of solute A is larger than a molecule of solute B. 


enter, which is its elution volume minus the void volume, 
divided by the volume of the mobile phase within the 
bed, which is, by definition, the void volume Vj. 
Parameters other than the partition coefficient, however, 
are usually used to define the behavior of a solute on 
chromatography by molecular exclusion. If V; is the total 
volume of the bed of the chromatographic system, then 
the volume of the bed occupied by the stationary phase 
is Vy — Vr. The fraction of the volume of the stationary 
phase that is available to solute A is designated Kaa: 


Vea VG Ou 
re (1-21) 
av, Vy = Vy Vr 5 i 
Vo 


Another parameter is often used to describe the elution 
during chromatography by molecular exclusion. This is 
the fraction of the volume within the stationary phase 
available to a small reference solute, solute R, that is also 
available to solute A, and it is designated Kp, so that 


(1-22) 


where Van is the volume at which solute R elutes. If the 
reference solute were able to enter the entire aqueous 
phase within the stationary phase, V, then Map would 
be equal to V; and Kp, would equal KA) — bel `. The 
difficulty with this definition is that it depends on 
the identity of the reference solute. 
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Problem 1-1: Assume that the total volume of mobile 
phase in the column described in Figure 1-3 is 45 cm’. 
Calculate a; for aspartic acid, threonine, glutamic acid, 
proline, glycine, alanine, and valine. Calculate the 
number of theoretical plates in the column used for the 
separation shown in Figure 1-3 from the peaks for thre- 
onine, serine, proline, glycine, alanine, and valine. 


Problem 1-2: List the following amino acids in order of 
their elution from a column of sulfonated polystyrene. 


homoserine 


CH39 l 

1 NH [(aminocarbony]) 
RAN: 2 methyl]-methionine 
R “cH norleucine 

SH homocysteine 
R i \ tryptophan 

N 

H 

O 
RN me N-(carboxymethyl)- 
N75 histidine 


Assay 


Homogenization of a biological specimen produces a 
complex mixture of proteins. Before any one of these 
proteins can be purified, there must be a way to identify 
it; an assay serves this purpose. An assay is any connec- 
tion between a specific biological phenomenon and a 
solution containing the protein responsible for this phe- 
nomenon. During a purification, separations of high res- 
olution are performed that produce large numbers of 
separate samples, and the need to locate the protein of 
interest within these separated fractions requires that 
they be individually assayed (see points in Figure 1-3). 
This fact puts a premium on the speed and efficiency of 
the assay used. 

One of the most common types of assay is one that 
monitors a chemical reaction catalyzed by an enzyme. 
One of the phenomena that occurs in living organisms is 
the conversion of fumarate to malate 


fumarate + H,O — malate 
(1-23) 


catalyzed by the enzyme fumarate hydratase.* 

When the homogenate from a porcine heart, which 
contains fumarate hydratase, is added to a solution of 
fumarate, which is otherwise quite stable, the fumarate 
begins to disappear and malate appears.” Because 
fumarate has significant absorbance at 300 nm (A399), the 


* In this section and in the rest of the book, enzymes are named 
according to the recommendations of the Nomenclature 
Committee of the International Union of Biochemistry and 
Molecular Biology (us.expasy.org/enzyme/). 
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decrease in Az% with time can be followed to assay the 
enzyme. 

The goal of any assay is to produce a signal that is 
directly proportional to the molar concentration of the 
particular protein being assayed. A necessary and suffi- 
cient condition for this to be the case is that the quantity 
measured, for example, the rate of decrease in Ao in the 
first few minutes, be directly proportional to the amount 
of sample added. This proportionality must be demon- 
strated directly by examining the magnitude of the signal 
used in the assay as a function of the amount of sample 
added.'**! The range over which this direct proportion- 
ality between the measurement and the added sample 
occurs must be known. Because the assay should always 
be performed within this range, it must be fairly broad to 
avoid the problem of having to assay every sample at a 
series of different dilutions to find the range. It is also 
helpful if the quantity measured, such as Azp in the case 
of fumarate hydratase, changes as a linear function of 
time within the interval chosen to monitor the reaction. 

The rate at which an enzyme converts a reactant 
into a product usually changes as the pH of the solution 
changes. It is always a good idea to measure the enzy- 
matic activity as a function of the pH of the solution to 
find the pH at which the rate of the reaction is at its max- 
imum and then use that pH in the routine assay.” 

A coenzyme is a molecule that is not a protein but 
nevertheless must be added to the assay of an enzyme for 
the reaction it catalyzes to occur. Because the coenzyme 
is not converted during the enzymatic reaction into a 
product, it is not a reactant. Coenzymes are used by 
proteins to provide chemical capabilities that cannot be 
provided by the side chains of its amino acids alone. 
Examples of coenzymes are pyridoxal phosphate, thiamin 
pyrophosphate, flavin adenine dinucleotide, biotin, lipoic 
acid, heme, chlorophyll, and ubiquinone. If an enzyme 
catalyzes a reaction requiring a coenzyme, that coenzyme 
usually must be added to the assay. Often the nature of 
the reaction is such that the coenzyme required is obvi- 
ous. For example, most enzymes that catalyze transami- 
nations require pyridoxal phosphate. Often, however, the 
requirement for a coenzyme is not obvious. 

2-Hydroxyphytanoyl-CoA lyase catalyzes the reaction 


2-hydroxy-3-methylhexadecanoyl-SCoA ==> 
2-methylpentadecanal + formyl-SCoA 
(1-24) 


The enzymatic activity could be readily assayed in the 
homogenate from a rat liver but disappeared as the 
purification proceeded. It was found that if thiamin 
pyrophosphate was added to the assays, however, the 
activity did not disappear.” This result demonstrates 
that thiamin pyrophosphate is a coenzyme for 2-hydroxy- 
phytanoyl-CoA lyase. There was enough of it in the initial 
homogenate to satisfy the enzyme, but it was lost as the 
purification proceeded. It might have been argued that 
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this coenzymatic requirement should have been 
expected because the enzyme catalyzes a cleavage 
immediately adjacent to an acyl carbon, but such argu- 
ments are usually after the fact. Often the requirement 
for a coenzyme is not obvious and is both difficult and 
frustrating to discover. 

It also often happens that, as with a coenzyme, a 
metallic cation, such as Mg”, Ca", Zn**, Cu”, Fe’, or K’, 
is required by a protein to perform its function and must 
be added to the assay. Although there are often obvious 
choices, such as Mg” for enzymes having phosphoesters 
as reactants, the requirement for a particular metal is 
often unpredicted. 

The most unambiguous assay of the reaction cat- 
alyzed by an enzyme is one in which the reactants and 
products are chromatographically separated after the 
reaction and the quantities of each are determined. The 
introduction of rapid, automated, high-pressure liquid 
chromatographic systems with associated monitoring 
systems of high sensitivity has made this approach con- 
venient and efficient. If radioactive reactants are avail- 
able that can be turned into radioactive products, 
reactants and products from a large number of assays 
can be separated in arrays of simple, inexpensive chro- 
matographic systems and their respective quantities can 
be determined by scintillation counting. 

Examples of assays in which reactants and products 
are chromatographically separated have been used for 
the purifications of the proteins geranyltranstransferase, 
cyclosporin synthase, methylamine-glutamate 
N-methyltransferase, and lysine N-methyltransferase. 
Geranyltranstransferase catalyzes the reaction 


geranyl diphosphate + isopentenyl diphosphate ==> 
(E,E)-farnesyl diphosphate + pyrophosphate 
(1-25) 


A sample of protein to be assayed for this enzymatic 
activity can be mixed with geranyl diphosphate and 
[1-""C]isopentenyl diphosphate and incubated for a set 
time. The reaction can then be terminated by adding 
alkaline phosphatase to hydrolyze rapidly the various 
diphosphates. After extraction, the resulting [1-*Cjfar- 
nesol and [1-'C]isopentenol in each sample can be sep- 
arated on small plates by thin-layer chromatography and 
separately quantified.” That the product was entirely the 
expected (E,E) isomer of [1-'4C]farnesol was demon- 
strated by gas-liquid chromatography. 
Cyclosporin synthase catalyzes the reaction 


L-glycine + 4 L-leucine + 2 L-valine + L-alanine + 
p-alanine + (2S,3R,4R,6E)-2-amino- 
3-hydroxy-4-methyl-6-octenoic acid + 
L-2-aminobutanoic acid + 11 MgATP + 
7 S-adenosylmethionine — cyclosporin + 11MgADP 
+ 11HOPO;* + 7 S-adenosylhomocysteine 
(1-26) 


If the reaction is run with S-adenosyl[methyl-'“C]methio- 
nine, the [‘“C]cyclosporin produced can be isolated, after 
extraction, by thin-layer chromatography.” 

Methylamine-glutamate N-methyltransferase cat- 
alyzes the reaction 


L-glutamate + [4C] methylamine ==> 
ammonia + N- ('“C]methyl-L- glutamate 
(1-27) 


The ('“C]methylammonium cation and the (‘“C]methyl- 
L-glutamate can be separated by isocratic chromatogra- 
phy by cation exchange.” L-Lysine, N*-methyl-.-lysine, 
and N*,N*-dimethyl-L-lysine are converted in the pres- 
ence of S-adenosyl[{methyl-*H] methionine into mixtures 
of N‘-PH]methyl-r-lysine, N‘,N°-PH]dimethyl-r-Iysine, 
and N‘,N*,N*-[*H]trimethyl-L-lysine by lysine N-methyl- 
transferase. After removal of unreacted S-adenosyl 
[methyl-*H]methionine with activated charcoal, the 
three radioactive products can be separated by thin-layer 
chromatography and quantified individually.” 

As in the previous example, where unreacted S- 
adenosyl[methyl-"H]methionine was removed by 
adsorption to activated charcoal, the chemical transfor- 
mation performed by an enzyme often produces a prod- 
uct that can be exclusively transferred to a separable 
phase. For example, tryptophan-tRNA ligase catalyzes 
the reaction 


MgATP + L-[“Cltryptophan + tRNA"? — 
AMP + Mg-pyrophosphate + L-[’“C]tryptophan-tRNA™? 
(1-28) 


The L-['“C]tryptophan-tRNA”” can be isolated from the 
assay solution as a precipitate, free of L-[“C]tryptophan, 
by treatment with acid and filtration through filters of 
glass fiber. The [’C]CO, released from L-[1-*C]gluta- 
mate by glutamate decarboxylase” or from 4-hydrox- 
yphenyl[1-'‘C]pyruvate by 4-hydroxyphenylpyruvate 
dioxygenase” can be released as a gas from the assay 
solutions by treatment with acid and collected in a sepa- 
rate well containing a strong base. The enzyme encoded 
by the murG gene of Escherichia coli catalyzes the addi- 
tion of the N-acetylglucosamine from UDP-N-acetylglu- 
cosamine to the 4’ position of the muramoyl group in 
1’-O-ß-[3(R)-3,7-dimethylhept-6-enyl]-1’-diphospho- 
2’-N-acetylmuramoyl-L-alanyl-D-yglutamyl-6-carboxy- 
L-lysyl-p-alanyl-p-alanine. A derivative of the hep- 
tenyldiphospho-N-acetylmuramoyl pentapeptide to 
which a molecule of biotin has been covalently attached 
can be used in an assay for this enzyme! along with 
UDP-N-["‘C]acetylglucosamine. The resulting biotiny- 
lated B(1,4)-N-["C]acetylglucosaminylheptenyldiphos- 
pho-N-acetylmura moyl pentapeptide can be separated 
cleanly and quantitatively from the remaining UDP- 
N-['*C]acetylglucosamine by adsorbing it to a solid phase 
on which has been attached covalently the protein 


avidin, which binds the biotin in the product with high 
affinity. 

A special case of assays that depend on transferring 
a product or a reactant to a separate phase are those used 
to monitor the binding of a small molecule to a protein. 
Certain proteins, known loosely as receptors, often do 
not catalyze a chemical reaction but respond to specific 
small molecules, referred to as agonists, by binding them 
and then undergoing a change in structure. Receptors are 
assayed by their ability to bind either these agonists or 
similar molecules that also bind but do not elicit the 
response, referred to as antagonists. In such binding 
assays, the receptor and a suitable radioactive agonist or 
antagonist are mixed together, the binding is allowed to 
come to equilibrium, and the receptor-agonist or recep- 
tor-antagonist complex is separated from unbound ago- 
nist or antagonist, respectively. Because receptors are 
usually proteins dissolved in membranes, the separation 
of bound from unbound ligand often takes advantage of 
the large size of the fragments of membrane produced by 
homogenization, which can be separated from the rest of 
the solution by filtration or centrifugation. After the sep- 
aration, the amount of bound radioactivity is then deter- 
mined by scintillation counting. 

Chemically stable agonists or antagonists of high 
affinity for a receptor are required to ensure that the 
binding is at saturation so that all receptors are counted 
and to prevent dissociation of receptor and agonist or 
receptor and antagonist during the separation of bound 
and free radioactivity. These reagents are often produced 
by the synthesis of analogues of the natural compounds. 
For example, [H]dihydroalprenolol is a radioactive 
synthetic compound that binds tightly (dissociation 
constant = 2 nM)” to the ß-adrenergic receptor, which 
physiologically responds to epinephrine. Its binding has 
been used as an assay during the purification of this 
receptor.” Often a synthetic compound the binding of 
which to a receptor is strong has been obtained during a 
search for pharmaceutically useful agents. An example of 
this kind of product is prazosin, which was developed as 
a drug specific for a-adrenergic receptors and the bind- 
ing of which (dissociation constant = 1nM) could be 
used as an assay during the purification of the a,-adren- 
ergic receptor.” Often the naturally occurring agonist 
has an affinity great enough that it can be used in an 
assay during the purification of the receptor. For this 
purpose, it is synthesized in a radioactive form. Examples 
would be the use of the binding of ’°I-epidermal growth 
factor” (dissociation constant = 20 nM) and the binding 
of [1,2-*H,] progesterone” (dissociation constant = 1 nM) 
as assays for their receptors. 

In all binding assays for receptors, the difficulty is to 
separate the complex between the receptor and the ago- 
nist from the unbound agonist without losing the bound 
agonist through the dissociation of the complex. It is 
often possible to sediment the complex in a preparative 
ultracentrifuge.” This strategy is particularly useful for 
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weakly bound agonists that dissociate rapidly after the 
unbound agonist is removed, because during sedimenta- 
tion the concentration of unbound agonist does not 
change so the amount of bound agonist does not either. 
The small amount of unbound agonist in the pellet can 
be estimated and a correction made to obtain an accu- 
rate measurement of the bound agonist. With agonists 
and antagonists that bind tightly, the complex can be 
separated rapidly with little loss of bound radioactivity 
on rapid chromatography by molecular exclusion on 
small, disposable Columns 

Binding assays have also been developed for pro- 
teins that associate with specific nucleotide sequences in 
DNA,” such as promoters or other regulatory elements. 
A short fragment of DNA labeled with [?P]phosphate at 
one end and containing the sequence of interest is used 
as a reagent. When such a fragment is digested with 
deoxyribonuclease I and the products are then separated 
by electrophoresis, a characteristic pattern of shorter 
segments of DNA of various lengths is obtained as a 
result of random cleavage by the nuclease of the phos- 
phodiesters along the double-stranded DNA. The pres- 
ence of a protein that binds specifically to a particular 
nucleotide sequence in a short fragment of end-labeled 
DNA results in prevention of cleavage of the DNA by the 
nuclease at that site. The fragments resulting from cleav- 
ages in this region disappear from the display, and this 
footprint demonstrates that the DNA-binding protein is 
present. Such an assay can be used to determine the rel- 
ative concentration of the DNA-binding protein by 
examining the patterns produced as a series of dilutions 
is performed in the solution of the protein added to the 
end-labeled DNA. 

An enzyme that catalyzes a physical or chemical 
transformation of DNA can often be assayed by separat- 
ing the product of the transformation from the reactant 
by electrophoresis. Deoxyribonucleic acid primase/heli- 
case from T7 bacteriophage catalyzes the unwinding of 
double-stranded DNA. Double-stranded DNA, one of the 
strands of which has been labeled with [**P]phosphate at 
its 5’ end, is mixed with a sample of protein to be assayed 
for this activity, and after a few seconds the reaction is 
quenched with dodecyl sulfate. The **P-labeled single- 
stranded DNA produced by the unwinding can be sepa- 
rated from the **P-labeled double-stranded DNA by 
electrophoresis.“ 

Up to this point, with the exception of that for 
fumarate hydratase, the assays described have been dis- 
continuous ones. The reaction is allowed to proceed for 
a certain interval, it is quenched in some way, and the 
amount of product formed is then measured, usually by 
dissecting the final, quenched solution. Because less 
manipulation is required and because the result is imme- 
diate, continuous assays in which the product of the live 
enzyme is monitored as it is formed are more con- 
venient. As in the assay for fumarate hydratase, the 
continuous change in absorbance of a reactant or 
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product is often followed. The reaction catalyzed by 
2-methyleneglutarate mutase 


2-methyleneglutarate = 2-methylene-3-methylsuccinate 
(1-29) 


can be assayed*’ by monitoring the change in 
absorbance of the solution if the 2-methylene-3-methyl- 
succinate is converted immediately as it is produced into 
2,3-dimethylmaleate with the enzyme methylitaconate 
A-isomerase. 2,3-Dimethylmaleate absorbs most 
strongly at 230 nm, but in the assay for the enzyme, 
absorbance changes between 240 and 256nm were 
monitored in order to avoid interference from the 
absorbance of nucleic acids and protein in the samples. 

Because the enzymatic activity in the initial 
homogenate and at early steps in the purification of a 
protein is usually quite low, large amounts of these het- 
erogeneous mixtures of protein and nucleic acid often 
must be added to the assay and their intrinsic 
absorbance can be appreciable. This problem precludes 
the use of absorbance changes at wavelengths below 
240 nm for continuous assays based on absorbance 
because all proteins absorb too strongly in this range. A 
crude mixture of protein and nucleic acid also has a sig- 
nificant and rather uniform absorbance between 240 and 
290 nm so the change in absorbance being monitored in 
the assay must be great enough to overcome this inter- 
ference, as it is in the assay for 2-methyleneglutarate 
mutase (between A£ = 3700 M! cm! and AEs = 
660M cm for the production of 2,3-dimethyl- 
maleate).”'. 

In any type of enzymatic assay, the proteins and 
nucleic acids in the early, crude mixtures can interfere in 
other, unexpected ways with many otherwise useful 
assays. In particular, contaminating proteins may cat- 
alyze the transformation of either the reactants or the 
products of the enzyme being assayed. The reaction cat- 
alyzed by protocatechuate 3,4-dioxygenase 


3,4-dihydroxybenzoate + O, ==> 3-carboxy-cis,cis-muconate 
(1-30) 


could be followed by the increase in Aps as the reaction 
proceeded.” In crude homogenates, however, the 3-car- 
boxy-cis,cis-muconate was converted by a contaminat- 
ing enzyme into 3-carboxy-cis,cis-muconolactone, and 
absorbances had to be corrected for this further transfor- 
mation until the contaminating enzyme had been lost at 
an intermediate step in the purification. 

Even though an enzyme normally produces a prod- 
uct the absorbance of which is no different from that of 
the reactant, it is sometimes possible to design a syn- 
thetic reactant in such a way that its absorbance does 
change upon its conversion in a continuous assay. 
Medium-chain acyl-CoA dehydrogenase catalyzes the 
reaction 


octanoyl-SCoA = trans-2-octenoyl-SCoA + 2H* + 2e 
(1-31) 


involving no change in absorbance. If synthetic 4-thiaoc- 
tanoyl-SCoA is used as a reactant instead of octanoyl- 
SCoA, the 4-thia-trans-2-octenoyl-SCoA produced, 
because it is a vinylthioether, absorbs strongly at 312 nm 
(E312 = 22,000 MT cm’). 

Enzymes that operate at lipid-water interfaces are 
often difficult to monitor with a continuous assay 
because of the requirement that lipid and water be pres- 
ent together, which can lead to cloudy suspensions that 
interfere with spectrometry. Triacylglycerol lipase cat- 
alyzes the reaction 


triacylglycerol + H,O => diacylglycerol + fatty acid 
(1-32) 


Both the triacylglycerol and the diacylglycerol are fats or 
oils. If the triacylglycerol is presented to the enzyme on the 
surface of a droplet of oil in an oil drop tensiometer, the 
change in surface tension of the droplet resulting from the 
hydrolysis of the triacylglycerol can be monitored contin- 
uously.“ It is also possible, however, to produce a clear 
emulsion of water-filled micelles suspended in an organic 
solvent. If the lipase is incorporated into the aqueous 
phase of these “reverse” micelles and the triacylglycerol is 
dissolved in the organic solvent, the shift in the wave- 
length of the absorbance of the C=O stretch from 1751 to 
1715 cm’ occurring upon hydrolysis of the triacylglycerol 
at the interface between water and oil can be monitored in 
an infrared spectrophotometer.” Infrared spectrometry is 
useful in this instance because it is much less affected by 
light scattering than is visible or ultraviolet spectrometry. 

In the assay for 2-methyleneglutarate mutase 
(Equation 1-29), the 2-methylene-3-methylsuccinate is 
converted immediately and continuously upon its pro- 
duction into 2,3-dimethylmaleate by another enzyme, 
methylitaconate A-isomerase, that has been added 
intentionally to the solution. This is an example of a 
coupled continuous assay. A coupled assay is an assay in 
which one or more purified enzymes, usually commer- 
cially available, and any reactants required by those 
additional enzymes are added to transform the immedi- 
ate product of the enzyme of interest by a subsequent 
enzymatic reaction or reactions that produce a change in 
absorbance or fluorescence, the rate of which is directly 
proportional to the rate at which the product is pro- 
duced. The other enzymes and their reactants must be 
added at high enough concentrations that the complete 
transformation of the initial product is effectively imme- 
diate, and the initial product is converted continuously 
as it is formed. Another example of a coupled assay is that 
used to follow the ATPase reaction 


MgATP + H,O — MgADP + HOPO,” 
(1-33) 


catalyzed by myosin subfragment 1.“ Both the reactant 
2-amino-6-mercapto-7-methylpurine ribonucleoside 
and the enzyme purine-nucleoside phosphorylase are 
added to the solution in addition to MgATP and the 
ATPase. The inorganic phosphate produced is immedi- 
ately and continuously used by the phosphorylase to 
cleave the purine ribonucleoside to ribose-1-phosphate 
and 2-amino-6-mercapto-7-methylpurine that, unlike 
the ribonucleoside, absorbs strongly at 360 nm (Agen = 
11,000 MT cm’). This coupled continuous assay is 
useful for monitoring any one of the many enzymes that 
have inorganic phosphate as one of their products. 

Of all of the changes of absorbance that are 
employed in continuous enzymatic assays, none is more 
heavily used than the decrease in A349 of dihydronicoti- 
namide adenine dinucleotide (NADH; &3 = 6220 MI 
cmh or its phosphate (NADPH; €34) = 6100 M! cm), 
when it is oxidized to nicotinamide adenine dinucleotide 
(NAD*) or to its phosphate (NADP*), respectively, or the 
increase in A349 that occurs in the reverse reaction. There 
is a large class of enzymes, known as dehydrogenases, 
that use the oxidation-reduction pairs of either NAD* and 
NADH or NADP* and NADPH, and they can be assayed 
directly and continuously. For example, 3-hydroxyacyl- 
CoA dehydrogenase catalyzes the reaction 


S-acetoacetylpantetheine + NADH => 
(S)-S-(3-hydroxybutyryl) pantetheine + NAD* 
(1-34) 


The loss of NADH can be followed at 340 mm To insure 
that only 3-hydroxyacyl-CoA dehydrogenase is being 
assayed, a control in the absence of S-acetoacetylpan- 
tetheine is run. 

In the opposite sense, the increase in A349) can be 
used to follow the reaction catalyzed by glyceraldehyde- 
3-phosphate dehydrogenase (phosphorylating):”” 


p-glyceraldehyde 3-phosphate + HOPO;” + NAD* — 
3-phospho-p-glyceroyl phosphate + NADH 
(1-35) 


The absorbance change produced by a particular 
dehydrogenase can also be used in a coupled, continu- 
ous assay to monitor enzymatically catalyzed reactions 
that do not involve NADH. Examples of such coupled 
assays have been used for the purifications of pyruvate 
carboxylase, phosphomevalonate kinase, and imida- 
zoleglycerol-phosphate dehydratase. The oxaloacetate 
produced by pyruvate carboxylase 


MgATP + pyruvate + HCO; = 
MgADP + HOPO;” + oxaloacetate 
(1-36) 


can be monitored by adding NADH and excess malate 
dehydrogenase:”" 
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NADH + oxaloacetate => NAD* + (S)-malate 
(1-37) 


Phosphomevalonate kinase catalyzes the reaction 


MgATP + (R)-5-phosphomevalonate ==> 
MgADP + (R)-5-diphosphomevalonate 
(1-38) 


The MgADP can be monitored continuously as it is pro- 

duced by adding phosphoenolpyruvate, NADH, and an 

excess of both pyruvate kinase and L-lactate dehydroge- 
51 

nase: 


MgADP + phosphoenolpyruvate => MgATP + pyruvate 
(1-39) 


pyruvate + NADH = (S)-lactate + NAD* 
(1-40) 


This coupled assay is widely used for enzymes that 
produce MgADP. 

The 3-(imidazol-4-yl)-2-oxopropyl phosphate pro- 
duced by imidazoleglycerol-phosphate dehydratase 


p-erythro-1-(imidazol-4-yl) glycerol 3-phosphate => 
3-(imidazol-4-yl)-2-oxopropyl phosphate + H,O 
(1-41) 


during its assay” is consumed as it is produced by 
histidinol-phosphate transaminase: 


3-(imidazol-4-yl)-2-oxopropyl phosphate + L-glutamate => 
L-histidinol phosphate + 2-oxoglutarate 
(1-42) 


The unfavorable equilibrium of Reaction 1-42 is pulled 
in the direction written by the exergonic hydrolysis 
catalyzed by histidinol-phosphatase 


L-histidinol phosphate + H,O = L-histidinol + HOPO; 
(1-43) 


The 2-oxoglutarate produced by the transaminase 
(Equation 1-42) is converted back to L-glutamate by 
glutamate dehydrogenase: 


2-oxoglutarate + NH,* + NADH = 
H* + NAD* + H,O + L-glutamate 
(1-44) 


to produce the decrease in A349. Similar coupled assays 
are used to monitor other transaminases. 

One of the more subtle uses of a coupled assay 
based on the A34, of NADH is the one devised” for 
hydroxymethylglutaryl-CoA lyase: 
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(S)-3-hydroxy-3-methylglutaryl-SCoA => 
acetyl-SCoA + acetoacetate 
(1-45) 


This coupled assay takes advantage of the fact that the 
equilibrium of the malate dehydrogenase reaction 
(Equation 1-37) lies in the direction of NAD* and 
(S)-malate so that if NAD*, malate, and malate dehydro- 
genase are mixed together, little oxaloacetate and NADH 
are formed. With this in mind, it can be seen that if 
(S)-malate, NAD*, and excesses of citrate (si) synthase 
and malate dehydrogenase are present during the 
progress of the reaction catalyzed by hydroxymethyl- 
glutaryl-CoA lyase, the conversion of the acetyl-SCoA 
into citrate by citrate (si) synthase 


acetyl-SCoA + H,O + oxaloacetate — citrate + HSCoA 
(1-46) 


consumes oxaloacetate and pulls the unfavorable equi- 
librium of the malate dehydrogenase reaction in the 
direction of NADH production, and hence an increase in 
the A349 of the solution is observed. 

The two or more enzymatic steps in a coupled assay 
are sometimes disconnected rather than allowed to pro- 
ceed simultaneously. An example would be an assay” for 
ribose-phosphate diphosphokinase: 


MgATP + p-ribose 5-phosphate => 
MgAMP + 5-phospho-a-p-ribose 1-diphosphate 
(1-47) 


The reaction is quenched by boiling, and the amount 
of 5-phospho-«a-p-ribose 1-diphosphate that has 
accumulated is determined by adding orotate, orotate 
phosphoribosyltransferase, and orotidine-5’-phosphate 
decarboxylase: 


5-phospho-a-p-ribose 1-diphosphate + orotate ==> 
orotidine 5’-phosphate + pyrophosphate 
(1-48) 


orotidine 5’-phosphate = UMP + CO, 
(1-49) 


The decrease in A95 due to the loss of orotate is propor- 
tional to the 5-phospho-«-D-ribose 1-diphosphate origi- 
nally present in the quenched samples. The 
decarboxylation has been incorporated in the assay to 
draw the reactions to completion. 

Colorimetric assays are assays in which a reagent is 
added that reacts chemically rather than enzymatically 
with a product of the enzymatic reaction being 
monitored to produce a change in absorbance, often 
observed visually as a dramatic change in the color of the 
solution. Monophenol monooxygenase catalyzes the 
reaction 


L-tyrosine + L-3-hydroxytyrosine + O; => 
L-3-hydroxytyrosine + L-dopaquinone + HO 
(1-50) 


When 3-methyl-2-benzothiazolinonehydrazone has 
been added to the solution of the assay, the 
L-dopaquinone reacts rapidly and quantitatively with it 
to produce a dark pink color Le = 29,000 MI cm"), the 
appearance of which can be monitored continuously. As 
is medium-chain acyl-CoA dehydrogenase (Equation 
1-31), (S)-pantolactone dehydrogenase, which catalyzes 
the oxidation 


(S)-pantolactone — 2-dehydropantolactone + 2H* + 2e 
(1-51) 


is a member of a large class of enzymes that catalyze 
oxidation-reduction reactions and then transfer the 
electrons involved either to or from small proteins or 
natural compounds the role of which is to receive or 
provide electrons. These natural donors or acceptors can 
often be replaced by synthetic donors or acceptors. 
(S)-Pantolactone dehydrogenase accepts phenazine 
methosulfate as an oxidant in place of the acceptor it uses 
naturally, and reduced phenazine methosulfate readily 
oxidizes nitrotetrazolium blue. When both of these com- 
pounds are present in the assay, the appearance of difor- 
mazan, which is the product of the oxidation of 
nitrotetrazolium blue, can be followed by its strong 
absorbance (£57 = 40,200 MT cm’). It is possible to 
monitor the production of coenzyme A by citrate (si) 
synthase (Equation 1-46) continuously” by the addition 
of 5,5’-dithiobis(2-nitrobenzoate). This reagent reacts 
with the thiol of the coenzyme A as it is formed to release 
the bright yellow 2-nitro-5-thiolatobenzoate dianion. 
This assay is useful for monitoring any enzyme that pro- 
duces coenzyme A. 

The colorimetric assays described so far are contin- 
uous assays in which the chemistry of the colorimetric 
reagent is compatible with the aqueous solution and 
neutral pH required to avoid denaturation and inactiva- 
tion of the protein being assayed. This is usually not the 
case with colorimetric reagents. In the instances in which 
it is not, the assay must be quenched after a convenient 
interval before the colorimetry is performed. 2-Hydroxy- 
6-ketonona-2,4-diene-1,9-dioic acid 5,6-hydrolase pro- 
duces succinate as one of its two products. The 
production of succinate is coupled in the assay” to the 
reaction catalyzed by succinate-CoA ligase (ADP-form- 
ing): 


MgATP + succinate + coenzyme A = 
MgADP + succinyl-SCoA + HOPO;” 
(1-52) 


After 15 min, the solution is heated to 100 °C to quench 
the enzymatic reaction and the inorganic phosphate is 


assayed by its reaction with Malachite green in the pres- 
ence of citrate, which produces strong absorbance at 
600 nm. Phosphate produced during an enzymatic reac- 
tion can also be determined colorimetrically by the addi- 
tion of ammonium molybdate in dilute sulfuric acid and 
a strong reductant, which together produce a blue color 
proportional in magnitude to the phosphate present.” 
Glutamine-pyruvate transaminase will also catalyze the 
reaction 


L-glutamine + glyoxylate — 2-oxoglutarate + glycine 
(1-53) 


The glycine produced and the L-glutamine remaining 
will react with o-phthalaldehyde and a thiol, after the 
enzymatic conversion has been terminated, to produce 
complexes that absorb in the near ultraviolet.” The 
glycine complex, however, absorbs at a higher wave- 
length (Ama, = 330 nm). Galactonate dehydratase cat- 
alyzes the reaction 


D-galactonate => 2-0x0-3-deoxy-D-galactonate + H,O 
(1-54) 


After the reaction is quenched, the ketonic product is 
reacted with semicarbazide” to produce a semicar- 
bazone that absorbs at 250 nm.” Selenocysteine lyase 
catalyzes the reaction 


selenocysteine + 2RSH = L-alanine + H,Se + RSSR 
(1-55) 


where RSH is a mercaptan such as 2-mercaptoethanol. 
After the enzymatic reaction is stopped, the H,Se can be 
assayed colorimetrically by its reaction with lead acetate, 
a reaction that yields a yellow color.” 

Biological assays are assays in which the ability to 
evoke a complex biological response by samples added 
to cells or whole organisms is determined. For example, 
the assay for a protein referred to as the Hurler corrective 
factor measures the ability of this protein to prevent the 
accumulation of sulfated mucopolysaccharide in lyso- 
somes of intact cells. It is this accumulation that causes 
Hurler’s syndrome. Samples are added to a series of petri 
dishes on which fibroblasts from a patient with Hurler’s 
syndrome have been grown and [*°S]SO, is added. After 
several days, the accumulation of *°S-sulfated 
mucopolysaccharide is assessed by washing the cells and 
submitting them to scintillation counting.” In this 
particular assay, the decrease in accumulation of 
radioactivity was not directly proportional to the amount 
of sample added, and this problem was overcome by 
constructing a dose-response curve. 

A biological assay was also used for the maturation- 
promoting factor, which is a protein involved in control- 
ling the cell cycle.” Samples containing this protein 
could be assayed for its activity by injecting sequentially 
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diluted aliquots into individual oocytes from the frog 
Xenopus laevis and scoring the cells for the disappear- 
ance of geminal vesicles.® With the use of this assay, the 
protein could be followed during a purification proce- 
dure® and the remarkable fluctuation of its concentra- 
tion during the cell cycle could be documented.” 

The success of a particular assay is usually judged 
on the bases of its accuracy, sensitivity, and selectivity. 
For following the distribution of a protein during its 
purification, the accuracy of an assay is not critical—all 
that is needed is a way to decide whether or not it is 
present in a particular fraction—but for kinetic studies of 
the reaction catalyzed by an enzyme, accuracy is often 
critical.” If only small amounts of a protein are present, 
the sensitivity of an assay is also often critical. It is 
usually to increase the sensitivity of an assay that 
radioactive reactants are used so that the small amounts 
of product produced or ligand bound can be identified. 
Fluorescence is often used for the same purpose. For 
example, continuous assays monitoring the absorbance 
of NADH can detect its production at 10 nmol min’! 
mL” but those monitoring its fluorescence can detect its 
production at 0.1 nmol min”! mL"!. When following the 
increase in a particular product produced from a partic- 
ular reactant or the binding of a particular ligand, an 
assay is usually selective for a particular protein, but suf- 
ficient selectivity is often difficult to achieve. It was only 
when agonists and antagonists of high affinity and high 
selectivity were synthesized that the various receptors for 
epinephrine could be separately identified and purified. 
A suspension of cellular membranes displays a rather 
high level of adenosine triphosphatase activity arising 
from a number of different proteins. It was only when that 
portion of this activity for which sodium/potassium- 
exchanging ATPase was responsible could be clearly dis- 
tinguished that it became possible to purify the 
enzyme.” 
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Problem 1-3: Design a coupled assay, based on the 
release of [“C]CO,, for the enzyme cis-aconitase, which 
catalyzes the reaction 


citrate — isocitrate 


Problem 1-4: Design a coupled assay based on the 
reduction of NAD* for the enzyme fumarate hydratase. 


Problem 1-5: Design a coupled assay for phospho- 
fructokinase, the enzyme that catalyzes the reaction 
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fructose 6-phosphate + MgATP => 
MgADP + HOPO;” + fructose 1,6-diphosphate 


Problem 1-6: Design a coupled assay for 5-aminopen- 
tanamidase: 


5-aminopentanamide + H,O == 5-aminopentanoate + NH,* 


Problem 1-7: Design a coupled assay for succinyl- 
diaminopimelate transaminase: 


N-succinyl-L-2,6-diaminoheptanedioate + 2-oxoglutarate => 
N-succinyl-2-L-amino-6-oxoheptanedioate + L-glutamate 


Purification of a Protein 


The homogenization of a biological specimen produces 
a clarified solution of protein and nucleic acid. Animal 
tissues can be diced, blended, and then processed with a 
homogenizer to produce a turbid suspension that is then 
clarified by centrifugation. Plant tissues, because their 
cells are surrounded by strong cell walls, must be homog- 
enized more forcefully.” If the protein of interest is 
located in one of the organelles within a plant or animal 
cell, that organelle is often isolated from the homogenate 
and then separately fragmented. For example, a protein 
required for the activation of transcription was purified 
from an extract of nuclei that had been isolated from a 
homogenate of HeLa cells, and 2-hydroxyphytanoyl- 
CoA lyase was purified from sonicated peroxisomes that 
had been isolated from a homogenate of rat liver.” 
Bacteria, because they are small, single cells surrounded 
by a tough integument, are particularly difficult to 
homogenize. Sonication or passage through a French 
pressure cell is usually required. 

If the source that has been chosen is a plant or an 
animal, different organs and different species are 
scanned with the assay to find a source in which the par- 
ticular protein is present at the highest relative concen- 
tration. If the source is a bacterium, various strategies are 
employed to increase the concentration of the protein of 
interest. For example, the Xanthobacter from which 
cyclohexane monoxygenase was purified were grown on 
cyclohexane as the sole carbon source because this 
enzyme is one of those in the pathway that catabolizes 
cyclohexane.” 

The goal of the purification of a protein from the 
clear solution produced by centrifugation of a 
homogenate is to isolate that protein, whose presence 
and relative molar concentration can be followed by a 
specific assay, from all of the other proteins present. To 
do this, advantage is taken of the properties that distin- 
guish a molecule of one protein from a molecule of 
another. Proteins are macromolecules of molar mass 
10,000-10,000,000 g mol!. Unless the molecules of a 
protein have been posttranslationally modified hetero- 


geneously by processes such as glycosylation, phospho- 
rylation, endopeptidolytic digestion, or acetylation, each 
molecule of a given protein has the same covalent struc- 
ture, the same distribution of polar and nonpolar func- 
tional groups over its surface, and the same shape as 
every other molecule of the same protein. Different pro- 
teins are distinguished from each other by differences in 
these properties. 

A molecule of protein can be a globular macro- 
molecule, the shape of which resembles a hollow metal 
sphere that has been dented at random, or it can be a 
fibrous macromolecule, the shape of which is elon- 
gated, often dramatically, in one dimension, irregular, 
and either rigid or flexible. The diameters of globular 
proteins vary from 2 to 10 nm; the lengths of fibrous pro- 
teins can be as great as 300 nm. Positively and negatively 
charged functional groups are distributed in a character- 
istic array over the surface of each molecule of a particu- 
lar protein. In addition to guanidinium ions from the 
arginines, these charged functional groups are carboxy- 
late ions from the glutamates and aspartates, ammo- 
nium ions from the lysines, and imidazolium cations 
from the histidines, each of which can be neutralized by 
lowering or raising the pH. As a result, the net charge on 
a molecule of protein varies with the pH and can be neg- 
ative or positive within normal physiological ranges. 
Patches of nonpolar functional groups are distributed in 
a characteristic array over the surface of each molecule 
of protein. The affinity of these patches for nonpolar 
solid phases can be exploited to separate molecules of 
one protein from those of another. Chromatography is 
used to separate molecules of protein by differences in 
their size, their shape, their charge as a function of pH, 
and the unique distribution of polar and nonpolar 
groups on their surfaces. 

The strategy for the purification of a protein is tai- 
lored to the particular problems faced in each instance. 
Usually it includes a series of steps, each involving a frac- 
tionation of the solution by chromatography or adsorp- 
tion. Each step produces a series of fractions, from two to 
several hundred, each contained in a volume of aqueous 
solution. Those fractions containing the protein of inter- 
est are identified by the assay, they are pooled together, 
and the protein in the pool is submitted to the next step 
of fractionation. The three requirements for the success- 
ful purification of a protein are an assay for the protein, 
the ability to minimize or prevent the loss of the protein 
through endopeptidolytic degradation or denaturation, 
and a source of the protein of sufficient abundance. 

The progress of the purification of a protein is usu- 
ally evaluated by examination of both the total activity 
recovered, which is a measure of the yield of the particu- 
lar protein being purified at each step, and the specific 
activity, which is a measure of the enrichment of the pro- 
tein of interest relative to the other proteins present 
(Table 1-1). The assay provides a numerical value for the 
amount of biological activity in a milliliter of the solution 
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Table 1-1: Purification of Aryl-acylamidase from Nocardia globerula” 


purification step total protein total activi 


(mg) (umol min”) 
cell-free extract 4400 560 
ammonium sulfate 4100 540 
phenyl-agarose“ 460 450 
DEAE-Sephacel? 38 210 
Sephadex G-150° 18 140 
anion exchange“ 4.7 90 
ammonium sulfate 2.4 55 
Superose® 1.2 45 


specific activity yield of activity enrichment 
(umol min! mg”) (%) (x-fold) 
0.13 100 1 
0.13 97 1 
0.97 80 7 
5.4 37 41 
8.1 26 60 
19 16 150 
23 10 180 
37 8 290 


“Beaded, cross-linked agarose (Figure 1-7) to which phenyl groups are attached in ether linkage. "Figure 1-3. ‘Beaded, cross-linked dextran for chromatography by molec- 
ular exclusion. “(CH;),;N*CH) - groups covalently linked to a beaded hydrophilic polyether. ‘Beaded, cross-linked agarose for chromatography by molecular exclusion. 


being assayed. For an enzymatic assay, this value is the 
number of micromoles of reactant that would be con- 
verted to product every minute if one milliliter of the 
solution had been added to the assay. The total activity 
present after any step in the purification is the activity 
milliliter! multiplied by the total number of milliliters in 
the pool of fractions. The yield of activity is the percent- 
age of the initial total activity remaining after each step. 
Although the yield of activity usually decreases as the 
purification proceeds, sometimes it increases, for exam- 
ple if an inhibitor of the activity is removed during a 
step.” 

The concentration of protein, in units of mil- 
ligrams milliliter’, in the pool of fractions is also assayed. 
The most accurate method for making this determina- 
tion” is quantitative amino acid analysis (Figure 1-3), 
but this procedure is too tedious and time-consuming 
for routine assays. The Biuret colorimetric assay” is the 
most accurate rapid method, but its low sensitivity often 
requires that an unreasonable portion of a precious 
sample be sacrificed. The Lowry” colorimetric method, 
because of its sensitivity, is the most widely used method 
for determining the concentration of protein in a sample, 
but it suffers from the drawbacks that many solutes other 
than protein also produce color and that different pro- 
teins give different yields of color. For example, it was 
shown that the concentration of protein in samples of 
purified hydrogenase I from Clostridium pasteuranium, 
which had been accurately quantified by quantitative 
amino acid analysis, was overestimated by the Lowry 
procedure by a factor of 1.37 + 0.03.” The least quantita- 
tive but most convenient and rapid methods for assess- 
ing the concentration of protein are the colorimetric 
method of Bradford” and the absorbance of the solution 
at 280 nm. The specific activity of a pool of fractions 
from a step in the procedure for purifying the protein is 
the amount of biological activity displayed by a mil- 
ligram of the proteins in that solution—the activity milli- 
liter” divided by the amount of protein milliliter. For 
an enzyme, the units of specific activity are (micromoles 


of reactant converted) minute”! (milligram of protein)". 
The enrichment in the protein of interest during a par- 
ticular series of steps is the increase in its specific activ- 
ity relative to its initial specific activity in the 
homogenate. 

There is a conventional order in which the various 
steps of the purification are carried out. This order is usu- 
ally determined by the amount of material a certain pro- 
cedure can accommodate, because the amounts that 
must be processed, if the samples have been concen- 
trated after each step, always decrease as the purification 
proceeds because of the decrease in the total amount of 
protein. Precipitations can be carried out on large vol- 
umes and are usually the first step in a purification. If 
appropriate, selective adsorption is used in the next step 
because it is an efficient method for handling large 
samples and the media are usually inexpensive. 
Chromatography by ion exchange is usually used before 
chromatography by adsorption because the media used 
for the former are usually less expensive and have higher 
capacity. Chromatography by molecular exclusion is 
usually used as a late step because it is most successful 
when the samples, and hence the amount of protein, are 
as small as possible. 

The purification of aryl-acylamidase 


N-acetyl-o-toluidine — acetate + o-toluidine 
(1-56) 


from Nocardia globerula (Table 1-1) illustrates this sys- 
tematic strategy. In each step of the purification the spe- 
cific enzymatic activity increases as extraneous proteins 
are separated from the desired protein, and the yield of 
enzymatic activity after each step is high. Nevertheless, 
because there are so many steps, the overall yield is only 
8%, but an 8% yield is high for the purification of a pro- 
tein. In this example, chromatography by molecular 
exclusion on Superose is used in the last step when total 
amounts of protein are small so that the samples can be 
concentrated to the small volumes required by this pro- 
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cedure. Chromatography by anion exchange (DEAE- 
Sephacel), however, can be used early in the purification 
because large volumes at low concentration of elec- 
trolyte can be passed through the ion-exchange 
medium to concentrate the protein on the top of the 
column. The chromatography itself is then initiated by 
increasing the concentration of electrolyte. 

The precipitation of proteins from an aqueous 
solution that is effected by the addition of a high con- 
centration of another solute has a long history. 
Originally, such precipitations were observed upon the 
addition of certain salts to solutions of proteins. This 
observation led to the terms salting out, to describe a 
precipitation caused by a salt, and salting in, to 
describe the dissolution of a precipitate caused by a 
salt. For example, sulfate ion salts out, and thiocyanate 
ion and guanidinium ion salt in. A systematic study of 
the effect of salts on the solubility of proteins led to the 
Hofmeister serie, "TTT an ordering of various ions on 
the basis of their ability to salt out or salt in.* Similar 
effects, however, are observed with nonionic solutes as 
well, somewhat confounding the words chosen. Urea 
salts in, and poly(ethylene glycol) salts out. 

It has been shown that these capacities of solutes, 
both ionic and nonionic, to affect the solubility of a pro- 
tein can be ascribed to differences in preferential solva- 
tion.” The preferential solvation of a particular 
protein by a particular solute can be defined by the equa- 


tion 
om, 
am, 
T, UHO» Hs 


Ys 


(1-57) 


preferential solvation = 


where m, is the grams of that solute in the solution for 
every gram of water, m, is grams of that protein in the 
solution for every gram of water, % is the concentration 
of the solute in the solution in grams milliliter, Tis the 
temperature, and Un. and u, indicate that both the 
chemical potential of the water and the chemical poten- 
tial of the solute must remain constant as the grams of 
protein, dm,, change. Solutes that display negative 
values of preferential solvation salt out and solutes that 
display positive values of preferential solvation salt in, 
and the magnitude of their values of preferential solva- 
tion correlates with the potency of their ability to salt out 
or salt in. 

A negative value for preferential solvation, indicat- 
ing salting out, states that grams of solute, dm,, must be 


* The effects of salts on many properties of proteins, such as their 
enzymatic activity®' and their specific associations with each 
other,” are often governed by the Hofmeister series.® 


removed from the solution whenever grams of anhy- 
drous protein, dm,, are added to maintain constant 
chemical potential. The usual reason® given for the 
observation of negative preferential solvation is that, in 
an aqueous solution, the layer of water surrounding the 
protein has properties distinct from those of the rest of 
the water in the solution and a salting-out solute is pref- 
erentially excluded from that layer of solvation. The 
reason grams of solute must be removed to maintain a 
constant chemical potential is that water is removed 
from the bulk solution to form this layer of hydration and 
solute must be removed from the overall solution to keep 
its concentration the same in the bulk solution sur- 
rounding the hydrated protein. 

A positive value for the preferential solvation of a 
particular solute states that the grams of that solute in 
the solution must be increased when the grams of pro- 
tein are increased in order to maintain constant chemi- 
cal potential. Therefore, the solute prefers to interact 
with the protein rather than with water; for example, it 
has a higher solubility in the layer of water around the 
protein or it simply binds to the protein. Positive prefer- 
ential solvations mean that the protein becomes more 
soluble as the solute is added to the solution. Such salt- 
ing-in is displayed by urea, potassium thiocyanate, and 
guanidinium chloride. At concentrations of 1M, the 
value of the preferential solvation of bovine serum albu- 
min by potassium thiocyanate® is +0.07 mL g and that 
of bovine serum albumin by guanidinium chloride™ is 
+0.26 mL g’. The ability of urea to increase the solubility 
of proteins is frequently used during their purification. 
For example, the proteins that form intermediate fila- 
ments, which are naturally occurring, insoluble poly- 
meric aggregates of protein, are purified from the 
solution that is obtained by dissolving the filaments in 
7 Murea.” The advantage of using urea is that because it 
is aneutral molecule, it has no effect on chromatography 
by ion exchange. 

Itis for precipitation, however, that preferential sol- 
vation is usually exploited during purification of a pro- 
tein. Assume that a solution is at saturation in the 
concentration of a particular protein; in other words, the 
chemical potential of that protein in the saturated solu- 
tion is equal to the chemical potential of that protein in 
its precipitate. If a solute with negative value of preferen- 
tial solvation is added to the saturated solution of pro- 
tein, some of the protein must precipitate to maintain a 
constant chemical potential. In reality, what happens is 
that as more and more of the solute is added, the chem- 
ical potential of the protein decreases until it equals that 
of its precipitate and then it begins to precipitate. The 
more negative the value of preferential solvation for the 
solute being added, the more rapidly does the concen- 
tration of protein reach and then surpass saturation. 
At 1M concentration, the value for the preferential 
solvation of bovine serum albumin by sodium sulfate® is 
-0.52 mL gl As a comparison, the preferential solvation 


of bovine serum albumin by NaCl, a salt that shows weak 
salting-out, is -0.26 mL g! at a concentration of 1 M. 
Although sodium sulfate has been used to precipitate 
proteins during purifications, ammonium sulfate is pre- 
ferred because it is more soluble than sodium sulfate and 
it is also lethal to fungi or bacteria that would otherwise 
be happy to use the precipitated protein as a source of 
food. A protein as a precipitate in a concentrated solu- 
tion of ammonium sulfate at 4 °C is usually stable for 
decades. Traditionally, the concentration of ammonium 
sulfate used to precipitate a protein is expressed as the 
percentage that the final concentration in the solution is 
of the concentration of ammonium sulfate at saturation 
(0.52 g mL” at 4 °C). 

Ammonium sulfate at high concentrations causes 
most proteins to precipitate from solution. In the exam- 
ple of aryl-acylamidase (Table 1-1), the enzyme was pre- 
cipitated between 25% and 60% ammonium sulfate. No 
purification was observed in this instance; the step was 
used to concentrate the protein and rapidly remove it 
from all of the other metabolites in the clarified 
homogenate. Usually, however, an attempt is made to 
obtain some purification. Each protein precipitates in a 
given range of ammonium sulfate concentration. 
Extraneous proteins that precipitate at lower concentra- 
tions can be removed first, and then the protein being 
purified can be precipitated by raising the concentration 
of ammonium sulfate and thus be separated from pro- 
teins that remain soluble at the higher concentration. For 
example, formate-tetrahydrofolate ligase was purified 
10-fold by bringing the solution of ammonium sulfate to 
50% of saturation to precipitate other proteins, then 
increasing the concentration of ammonium sulfate to 
70% of saturation to precipitate the synthase while leav- 
ing yet other proteins in the supernatant. Purification 
by ammonium sulfate precipitation is usually not so 
large as in this example, but the procedure is a mild one, 
usually of high yield. Precipitation with ammonium sul- 
fate can be used to concentrate rapidly and gently a solu- 
tion of protein between later steps in a purification 
(Table 1-1). 

Poly(ethylene glycol) has also been used to precip- 
itate proteins selectively and reversibly. It is easy to imag- 
ine why a large hydrophilic polymer such as 
poly(ethylene glycol) would be excluded from the layer of 
water surrounding a protein and thus have a negative 
value for preferential solvation. Tryptophan 5-mono- 
oxygenase can be purified 5-fold after precipitation with 
poly(ethylene glycol) and redissolution in aqueous 
buffer.” Trimethylamine oxide, a naturally occurring 
solute in the serum of fish,® is also able to precipitate 
proteins D 

Several other types of precipitation are used during 
the purification of a protein. At the pH at which a given 
protein bears no net charge, known as its isoelectric pH, 
it is least soluble in water. If the pH is adjusted to this 
value and the salts in the solution are removed by 


Purification of a Protein 23 


dialysis, the protein will often precipitate, while other 
proteins, which have different isoelectric points, do not. 
Such an isoelectric precipitation has been used in the 
purification of aspartate carbamoyltransferase” and in 
the purification of fibrinogen.” One traditional method 
of concentrating protein and removing it from other 
molecules in a homogenate is to precipitate it by adding 
acetone.” The resulting dry acetone powder can be 
extracted with a buffered aqueous solution, and if one is 
lucky, the protein of interest will dissolve. Because their 
DNA is not contained in nuclei, when bacterial cells are 
fragmented by homogenization, the DNA is released as 
an intractable, gelatinous mass. Before the solution can 
be processed further, the DNA must be precipitated with 
streptomycin sulfate” or the DNA must be hydrolyzed 
to small fragments that are not gelatinous by adding 
nucleases to the solution. 

Isoelectric precipitation and precipitations with 
poly(ethylene glycol) and ammonium sulfate are 
reversible, and the protein is readily redissolved by 
decreasing the concentration of precipitant or changing 
the pH. In contrast, precipitation by acid or heat is usu- 
ally not reversible. In these situations advantage is taken 
of the ability of the protein of interest to remain in solu- 
tion while other proteins precipitate irreversibly. An 
example of the use of precipitation with heat occurs in 
the purification of 6-phosphofructokinase, and during 
this step a 2.5-fold increase in specific activity was 
recorded.” These techniques are quite harsh and can 
lead to degradation of the protein being purified by 
endopeptidases or to chemical alterations such as 
deamidation of glutamine and asparagine side chains” 
even though little loss of enzymatic activity is recorded. 

Proteins are separated chromatographically by 
exploiting differences among them in particular proper- 
ties. Different proteins have different sizes and shapes 
and can be separated on chromatography by molecular 
exclusion. Different proteins also have different charges 
at a given pH and can be separated on chromatography 
by ion exchange. In the case of chromatography by ion 
exchange, a pH is chosen at which the protein to be puri- 
fied has a net charge opposite to the fixed charge on the 
chromatographic medium so that it will participate in 
ion exchange with the stationary phase as the chro- 
matography progresses. The elution of bound protein is 
usually performed with a gradient of increasing concen- 
tration of a simple monovalent salt such as KCl. If a gra- 
dient of pH is used, the change in pH is usually in the 
direction that would decrease the magnitude of the net 
charge on the protein. Because the value of a’ is chang- 
ing continuously, the use of a gradient always produces 
chromatographic separations of much lower resolution 
than those performed isocratically without a gradient. 
The advantage of a gradient, however, is that it bypasses 
the problem of finding conditions of pH and ionic 
strength at which the value of o for the protein being 
purified is in a usable range. Because molecules of pro- 
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tein are multivalent ions, their values of o change rap- 
idly as the ionic strength is varied, and such a search is 
often tedious and fruitless. 

Glyceraldehyde-3-phosphate dehydrogenase (GDH), 
phosphoglycerate mutase (PGM), and phosphoglycerate 
kinase (PGK) in the ammonium sulfate precipitate from 
a clarified homogenate could be separated on chro- 
matography by molecular exclusion (Figure 1-5A).”° 


Each of the three enzymes migrates with a characteristic 
elution volume, V,, and the glyceraldehyde-3-phosphate 
dehydrogenase is cleanly separated from the other two 
enzymes by molecular exclusion chromatography on a 
column of Sephadex G-150. The fractions containing the 
activities of phosphoglycerate mutase and phosphoglyc- 
erate kinase were combined and submitted directly to 
ion-exchange chromatography on DEAE-cellulose devel- 
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Figure 1-5: Chromatography by molecular exclusion (A) and chromatography by anion exchange (B) of proteins in a homogenate from the 
bacterium E. coli.” The clarified homogenate was submitted to precipitation with ammonium sulfate (30-45%). The precipitate (7.2 g of pro- 
tein) was redissolved in a minimum volume (120 mL) of aqueous buffer and submitted to zonal chromatography on a column (10 cm x 
120 cm) of cross-linked dextran (Sephadex G-150). (A) Fractions were assayed for protein (absorbance at 280 nm) and enzymatic activity 
(micromoles minute"! milliliter’) of glyceraldehyde-3-phosphate dehydrogenase (GDH), phosphoglycerate mutase (PGM), and phospho- 
glycerate kinase (PGK), respectively. The proteins contained in the fractions from 4.9 to 5.8 Lin the chromatogram in panel A were combined 
and submitted to chromatography by anion exchange. (B) The ionic strength of the buffer used for the chromatography by molecular exclu- 
sion was low enough that the sample (900 mL) could be passed directly through the column (2.2 cm x 25 cm) of diethylaminoethyl- (DEAE-) 
cellulose while the proteins gathered at the top of the medium for ion exchange. Chromatography was then initiated with a gradient of NaCl 
(0-0.15 M in the same buffer at pH 8). Fractions were again assayed for protein and enzymatic activity. Reprinted with permission from ref 


96. Copyright 1971 Journal of Biological Chemistry. 


oped with a gradient of sodium chloride (Figure 1-5B). In 
this step the phosphoglycerate mutase was cleanly sepa- 
rated from the phosphoglycerate kinase. These examples 
illustrate the use of column chromatography, monitored 
by enzymatic assay, to separate proteins. 

An example of the use of a sequence of steps of 
column chromatography to purify a particular protein is 
found in the purification of &-ketoisocaproate oxygenase 
from rat liver (Figure 1-6).”’ Aside from an initial ammo- 
nium sulfate precipitation, only three consecutive steps, 
chromatography by ion exchange (Figure 1-6A), chro- 
matography by adsorption (Figure 1-6B), and chro- 
matography by molecular exclusion (Figure 1-6C), were 
necessary to purify the enzyme to homogeneity. 

Because the resolution of chromatography by ion 
exchange run with a gradient and the resolution of chro- 
matography by molecular exclusion are not great 
(Figures 1-5 and 1-6), the increase in specific activity 
seen in each of the chromatographic steps is usually 
around 5-fold. Extreme examples of purification, such as 
the 100-fold purification of 3-deoxy-7-phosphoheptu- 
lonate synthase on phosphocellulose” or the 100-fold 
purification of methylcrotonyl-CoA carboxylase on 
DEAE-cellulose,” are rare. For reasons that are not obvi- 
ous, however, it has recently been discovered that the 
magnitude of the purification on chromatography by 
adsorption is often significantly greater than that 
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observed on chromatography by ion exchange or molec- 
ular exclusion. 

Traditionally, hydroxylapatite, because of its physi- 
cal properties, has been used mainly for selective adsorp- 
tion of proteins, but recently much more effective, 
beaded forms of hydroxylapatite that can be used for 
chromatography by adsorption have become available. 
Nitric-oxide reductase could be purified 100-fold by 
chromatography on one of these media,’ and acetyl- 
CoA hydrolase, 60-fold.‘°! The media most widely used, 
however, for chromatography by adsorption (Table 1-2) 
are produced by synthetically coupling defined organic 
functional groups or molecules or chelated metal ions!” 
to beaded hydrophilic matrices, usually cross-linked 
agarose or polymethacrylate. Although the intention in 
the syntheses in which organic molecules are covalently 
attached to the polymer has often been to produce a 
chromatographic medium with a specific affinity for one 
particular protein or class of proteins, most of these 
products have turned out to be simple adsorption media 
with useful and unexpected affinities for proteins in gen- 
era DP Ironically, this makes them more valuable than 
they were originally intended to be. 

Successful purification of a minor component from 
a complex mixture requires that the set of distribution 
coefficients, aj, for the components present assume a 
new and randomly permuted sequence of magnitudes as 
each new chromatographic medium is used. If it were 
possible to do so, a series of chromatographic steps 
would be designed so that all of the components that had 
similar values of o, in the preceding step, and that were 
not separated, have different values of cin the next step 
and are separated. The availability of a collection of 
microscopically uniform adsorption media with peculiar 
and unexpected affinities for proteins in general assists in 
this strategy. The purification achieved on these media is 
often dramatic (Table 1-2). There are two methods, how- 


Figure 1-6: Column chromatography of a-ketoisocaproate oxyge- 
nase from rat liver.” An ammonium sulfate (45-75%) precipitate 
(35 g of protein) of the clarified homogenate was redissolved, dia- 
lyzed to remove salt, and applied to a column (5 cm x 80 cm) of 
DEAE-cellulose (A). The chromatogram was developed with a gra- 
dient of NaCl (x) from 0 to 0.1 M. Fractions containing enzymatic 
activity (4g of protein) were pooled, concentrated, brought to 
2.5 M NaCl, and applied to a column (4 cm x 40 cm) of cross-linked 
agarose to which phenyl groups had been covalently attached. The 
proteins were eluted with a gradient between 2.5 M NaCl and 
buffer without added NaCl (B). The fractions containing enzymatic 
activity (500 mg of protein) were pooled, concentrated, and 
applied to a column (5 cm x 80 cm) of allyldextran cross-linked 
with N,N’-methylenebis(acrylamide) for chromatography by 
molecular exclusion (C). In each panel, a-ketoisocaproate oxyge- 
nase activity (nanomoles of CO, released minute”! milliliter; @) is 
presented as a function of fraction number. The total protein in 
each fraction (©) was also monitored by absorbance at 280 nm. The 
final yield was 70 mg of protein in the peak of enzymatic activity. 
Reprinted with permission from ref 97. Copyright 1982 Journal of 
Biological Chemistry. 
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Table 1-2: Purification of Proteins on Chromatography by Adsorption 


protein purified molecule attached covalently property of solution varied enrichment® 
to agarose” for elution (x-fold) 

coproporphyrinogen oxidase’ 

step 1 Cibacron blue increasing [sodium cholate]° 80 

step 2 phenyl group? increasing [Tween 80]° 2.5 
isocitrate dehydrogenase (NADP*)" 

step 1 reactive red increasing [NaCl] 20 

step 2 reactive red increasing [NADP]@ 15 

step 3 phenyl group” decreasing [(NH,)2SO,] 2 
formate-tetrahydrofolate ligase™ Matrex green increasing [KCl] 5 
glutamyl-tRNA reductase!” phenyl group” decreasing [KCl] 9 
aminodeoxychorismate lyase'™ reactive yellow increasing pH 260 


“Tn all cases cited, cross-linked agarose (Figure 1-7) was used as the polymeric support to which the organic molecules were covalently attached. "In ether linkage to agarose. 


These solutes are detergents. “Affinity elution. “Enrichment during each step. 


ever, that do not rely on chromatography and that often 
produce even greater degrees of purification. They are 
based on the selective elution from or selective adsorp- 
tion to a stationary phase and can be referred to as affin- 
ity elution or affinity adsorption, respectively. 

When a protein is purified by affinity elution, it is 
first adsorbed to a stationary phase, such as a chromato- 
graphic medium; and after all unabsorbed proteins have 
been washed away, a compound that binds with high 
specificity to the protein of interest and leads to its elu- 
tion is added (as for example in the second step of the 
purification of isocitrate dehydrogenase, Table 1-2). The 
presence of this compound can sometimes cause only 
that protein to which it binds to elute from the stationary 
phase. For example, when (carboxymethyl)cellulose is 
added to a crude, clarified homogenate from liver at pH 6, 
all of the fructose 1,6-bisphosphatase is adsorbed along 
with many other proteins. When the (carboxymethyl) 
cellulose is collected, washed well with 5mM sodium 
malonate, pH 6, and then rinsed with 0.06 mM fructose 
1,6-bisphosphate in 5 mM sodium malonate, pH 6, only 
the fructose 1,6-bisphosphatase elutes in the rinse. In one 
step the enzyme can be purified 400-fold, to homogene- 
pv TT Transketolase, after initial purification by DEAE- 
cellulose from homogenates of human leukocytes, will 
adsorb tightly to the top of a small column (16 mL) of 
(carboxymethyl)cellulose when a dilute solution (90 mL) 
of the protein dissolved at low ionic strength is passed 
over the column. After the column has been washed 
extensively, the transketolase is eluted with buffer to 
which xylulose 5-phosphate (0.2 mM) and ribose 5-phos- 
phate (0.3 mM) have been added. The transketolase is 
purified 40-fold to homogeneity.” Protein kinase N, 
bound to a methylenesulfonate cation-exchange 
medium, can be eluted specifically with ATP (0.1 mM) for 
a purification of 2500-fold.'” 

Although it requires much more effort, affinity 
adsorption is more widely used than affinity elution and 
has been successful in a number of instances. The basic 


idea in affinity adsorption is to synthesize a stationary 
phase to which has been covalently attached a chemical 
compound that binds specifically and with high affinity 
to the protein being purified. The compound syntheti- 
cally attached to the stationary phase is usually an analog 
or a derivative of a reactant or product in the reaction 
catalyzed by an enzyme, an inhibitor of the enzyme, an 
allosteric activator of the enzyme, or an agonist or antag- 
onist of a receptor. This compound, when covalently 
attached to the stationary phase, is referred to as an 
immobilized ligand for the protein. Cross-linked 
agarose’! is the stationary phase to which the immo- 
bilized ligand is usually attached. 

One of the original examples of this technique”? 
can serve to illustrate the strategy. Micrococcal nuclease 
is an enzyme from Staphylococcus aureus that can 
hydrolyze the phosphodiesters of either single-stranded 
RNA or double-stranded DNA to produce as its final 
products 3’-phosphomononucleotides or dinucleotides. 
Thymidine 3’,5’-bisphosphate is a specific inhibitor of 
the nuclease that binds to it tightly. A p-aminophenyl 
derivative of this inhibitor was synthesized and attached 
covalently to agarose through its aniline nitrogen to pro- 
duce a stationary phase displaying the thymidine 
3’,5’-bisphosphate (Figure 1-7).'"” When a crude super- 
natant containing micrococcal nuclease was passed over 
this affinity medium, none of the nuclease emerged but 
almost all of the protein did. The nuclease could then be 
eluted nonspecifically with dilute acetic acid in greater 
than 90% yield. It was completely purified in this one 
step. 

Since this early report, the technical aspects of 
affinity adsorption have been exhaustively explored. The 
main difficulty to which many of these investigations 
have been directed is positioning the ligand far enough 
from the polymeric matrix of the agarose to minimize 
steric hindrance and thus interact effectively with the 
protein.''*!!? This problem may explain many of the 
failed attempts to use the technique of affinity adsorp- 


o On O 
O 
Hy i 
eh 


tion. Several long, hydrophilic connecting links, usually 
referred to as spacers, that serve the purpose of the 
p-aminophenyl in the original example (Figure 1-7) have 
been developed to solve this problem. Often a long 
hydrophilic spacer is created during the set of reactions 
used to attach the ligand to the solid phase (Figure 1-8).** 
Many different strategies for attaching ligands of various 
structures to the stationary phase have been developed. 

The cases in which affinity adsorption has been 
successful in the purification of proteins provide a 
provocative collection of examples (Table 1-3). Because 
purifications of 100-fold in one step are not unusual, this 
approach has obvious advantages over the traditional 
strategy that combines chromatography by ion 
exchange, chromatography by molecular exclusion 
(Table 1-1), and chromatography by adsorption (Table 
1-2), where several steps are required to achieve the 
same degree of purification. Affinity adsorption, how- 
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agarose 


Figure 1-7: Synthetic strategy used to couple deoxythymidine 
3’,5’-bisphosphate covalently to agarose activated with cyanogen 
bromide (BrCN).!"? The cyanylation occurs randomly on the 
agarose. 


ever, often requires a greater investment than assem- 
bling a sequence of simple chromatographic steps and 
has a higher risk of failure. Often the affinity adsorbent 
produces only a modest purification of 10-fold or less 
under conditions that suggest that the process occurring 
is either nonspecific ion exchange’ or simple adsorp- 
tion!” or affinity elution from a nonspecific stationary 
phase Il Often the desired protein adsorbs so tightly to 
the affinity medium that it can be eluted only in low 
yield." 

The central, defining feature of affinity adsorption 
is the design of the stationary phase, but the conditions 
used for elution of the bound protein are also character- 
istic. Often they are merely the application of a mobile 
phase of extreme pH or ionic strength such as in the orig- 
inal example of micrococcal nuclease. The ideal 
approach, however, is to combine affinity adsorption 
with affinity elution to gain an advantage in each of the 
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Figure 1-8: Use of a hydrophilic spacer to connect a specific ligand 
to a polymeric support.” N,N-Di-(3-aminopropyl)amine was 
attached to agarose by activating the polysaccharide with 
cyanogen bromide (Figure 1-7). 1-(4-Amino-6,7-dimethoxy- 
2-quinazolinyl)piperazine, which is a portion of prazosin, a specific 
antagonist for o,-adrenergic receptors, was succinylated and then 
attached to the aliphatic amine by activation of the resulting 
carboxylic acid with 1-[(N,N-dimethylamino)propylj-3-ethyl- 
carbodiimide. This produced a spacer of 14 atoms connecting an 
oxygen of the polysaccharide with the nitrogen of the ligand. The 
spacer is hydrophilic by virtue of the O-alkyl-N-alkyl urea, the 
amine, and the two N-alkyl amides. This affinity medium was used 
to purify a-adrenergic receptor. 


two steps, and the protein is often eluted with a solution 
of the soluble ligand from which the immobilized ligand 
was derived (Table 1-3, Figure 1-9). 

Affinity adsorption has also been used to purify pro- 
teins that bind to particular nucleotide sequences in 
DNA.” The spacer holding the DNA recognized by the 
protein away from the surface of the agarose can be pro- 
duced by polymerizing short fragments of DNA contain- 
ing the target sequence to produce a long repeating 
double strand of DNA and then attaching this long 
repeating polymer to the agarose through one of its ends. 
The DNA closest to the surface of the agarose acts as a 
spacer for the more peripheral segments. Such an affin- 
ity adsorbent was used to purify the promoter-specific 
transcription factor Sp1.’ This protein binds to the 
nucleotide sequence GGGGCGGGGC in double- 
stranded DNA, and its concentration in a particular solu- 
tion can be assayed by observing its footprint on DNA 
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Figure 1-9: Affinity adsorption and affinity elution used in combi- 
nation to purify 5-formyltetrahydrofolate cyclo-ligase.'”’ (A) A 
crude extract (7.3g of protein in 2L) from the bacterium 
Lactobacillus casei was passed over a column (4 cm x 18 cm) of 
agarose to which 5-formyltetrahydropteroylglutamate had been 
attached. After the affinity adsorbent had been washed with 2 L of 
buffer until no more protein emerged, the bound enzyme was 
eluted with a solution of 5-formyltetrahydrofolate, a reactant in the 
enzymatic reaction. (B) A purified fraction (0.7 mg of protein in 
40 mL from a later step in the procedure) was passed over a column 
(2cm x 13cm) of agarose to which ATP had been covalently 
attached. After the affinity absorbent had been washed with 
100 mL of buffer, the bound enzyme was eluted with a solution of 
ATP, another reactant in the enzymatic reaction. Protein con- 
centration (milligrams milliliter';@) and enzymatic activity 
(nanomoles minute”! milliliter; ©) were measured for each frac- 
tion collected from each column. Reprinted with permission from 
ref 127. Copyright 1984 Journal of Biological Chemistry. 


Table 1-3: Examples of the Use of Affinity Adsorption in the Purification of Proteins 


protein ligand point of connection elution conditions enrichment 
to ligand (x-fold) 
3-deoxy-7-phosphoheptulonate synthase’ tyrosine (allosteric inhibitor) amino group 100 
procollagen-proline dioxygenase’® (Pro-Gly-Pro), (n = 10) amino terminus solution of (Pro-Gly-Pro), 1500 
UDP-glucose 4-epimerase!” UDP B-phosphate UMP 100 
L-lactate dehydrogenase!® NAD* adenosine N6 phosphate 4 
NAD* adenosine C8 pyruvyl NAD* 40 
AMP adenosine C8 pyruvyl NAD* 40 
isocitrate dehydrogenase (NAD*)!" AMP ribose NAD* 40 
choline O-acetyltransferase’”” coenzyme A gradient of NaCl 100 
cathepsin D!” pepstatin carboxy group pH8.5 100 
N-acetylglucosamine kinase!” glucosamine N2 glucose 20 
hexokinase’ glucosamine N2 glucose 40 
dihydrofolate reductase’ methotrexate carboxy group dihydrofolate <200 
N-acetylgalactosaminide-mucin asialomucin bound to DEAE- EDTA 20 
B-1,3-galactosyltransferase'” cellulose 
ornithine decarboxylase!'* pyridoxamine phosphate amino group pyridoxal phosphate 1000 
5-formyltetrahydrofolate cyclo-ligase!”’ 5-formyltetrahydropteroylglutamate carboxy group 5-formyltetrahydrofolate 4000 
B-adrenergic receptor!” alprenolol olefin addition isoproterenol 100 
@,-adrenergic receptor™ analogue of prazosin carboxy group prazosin 200 
plasminogen’”?%° L-lysine amino group e-aminocaproic acid 200 
choline O-acetyltransferase’*! 3-[O-(2”-aminoethyl)-3’-hydroxyphenyl]- ethylamine 3-(3’-hydroxyphenyl)- 
3-oxopropyltrimethylammonium 3-oxopropyltrimethylammonium 70 
protein geranylgeranyltransferase’” undecapeptide with sequence YREKKFFCAIL ` lysylamines pH 5.0 130 
adenylate cyclase’ succinylated deacetylforskolin succinyl carboxylate forskolin 2000 
a subunit of GTP-binding regulatory protein’ Bysubunits of the complete protein thiols of cysteines AIF,” 
myristoylated alanine-rich c-kinase substrate’ calmodulin lysylamines NaCl, EGTA 100 
[heparan sulfate]-gluosamine N-sulfotransferase'* adenosine 3’,5’-bisphosphate adenosine N6 adenosine 3’,5’-bisphosphate 40 
malate dehydrogenase (oxaloacetate-decarboxylating) adenosine 2’,5’-bisphosphate adenosine N6 NADP* 50 


(NADPH Š” 
binding protein for complement component C3'® 


complement component C3 


thiol of a cysteine 


20% ethanol 
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containing this specific sequence. An extract of nuclei 
from HeLa cells was purified on chromatography by 
molecular exclusion, chromatography by adsorption on 
heparin bound to agarose, chromatography by cation 
exchange on sulfated dextran, and affinity adsorption on 
agarose to which the specific DNA was attached. In the 
last step, the protein was eluted with a high concentra- 
tion (0.5 M) of KCl. The first three steps produced 100- 
fold purification with a 20% yield, and the last step alone 
produced a further 100-fold purification with a 50% yield. 
The inhibition of the DNA polymerase from herpes 
simplex virus by the antiviral agent 9-[O-(2-hydrox- 
yethyl) hydroxymethyljguanosine (acyclovir) results 
from the formation of a tight complex between the poly- 
merase, a duplex of DNA containing a template and a 
primer into which 9-[O-(2-hydroxyethyl) hydroxymethy]] 
guanosine has been incorporated at the 3’ end of the 
primer of DNA as it is being elongated, and the triphos- 
phate of the next nucleotide encoded by the template. 
When a duplex of template and primer into which 
9-[O-(2-hydroxyethyl) hydroxymethyl] guanosine had 
been incorporated was covalently attached to cross- 
linked agarose, the resulting affinity adsorbent bound 
the polymerase strongly, but only when GTP, the next 
nucleotide encoded by the template strand, was present 
in the solution. A homogenate was be passed over the 
adsorbent in the absence of GTP to adsorb proteins bind- 
ing nonspecifically to DNA. The unbound proteins from 
this first step were then added to the adsorbent in the 
presence of GTP, and the DNA polymerase was bound 
strongly. The column was rinsed with high salt and 
brought to low ionic strength, the GTP was removed to 
eliminate the binding with high affinity, and the DNA 
polymerase was then eluted with a gradient of NaCl.'* 
It is sometimes the case that during a step of the 
purification the activity of the enzyme of interest disap- 
pears, which can be discouraging. A common reason for 
such disappearance is digestion of the protein of interest 
by endopeptidases in the homogenate.'**'“© A more 
interesting reason, however, for loss of enzymatic activ- 
ity is that the function being assayed requires more than 
one protein and that these proteins are separated from 
each other during a step in the purification. The removal 
of mismatched uracil bases from double-stranded DNA 
is conveniently assayed by following the replacement by 
[a-**P]dCTP of a mismatched uracil in the center of a 
short segment of double-stranded DNA.'” Although the 
homogenate from HeLa cells displayed significant activ- 
ity in this assay, that activity disappeared upon fraction- 
ation by sulfopropyl-agarose. It could be regained by 
combining two of the fractions produced by this step 
The active protein in one of these two fractions was then 
purified to homogeneity by use of the assay supple- 
mented with the other fraction. The ability of this other 
fraction to support the enzymatic activity in the presence 
of the purified protein was lost upon its fractionation by 
phenyl-agarose. When two of the fractions from this step 


were combined, activity returned. The single proteins in 
each of these two fractions were then purified separately, 
in each case by use of assays supplemented with the 
other two necessary components. In the end, the three 
distinct proteins that together perform the reaction were 
each purified to homogeneity." Only when all three are 
mixed together is enzymatic activity observed. 

The goal of purification is to obtain the protein of 
interest isolated from all of the other proteins that were 
originally in the homogenate derived from the biological 
specimen. That this has been achieved is often suggested 
by the coelution of the protein present and the biologi- 
cal or enzymatic activity in the last chromatographic step 
of the purification (Figures 1-6 and 1-10).'” This is only 
an indication of purity, and the absolute purity of the 
final preparation must always be demonstrated inde- 
pendently by electrophoresis. 
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Figure 1-10: Chromatography by molecular exclusion of malate 
synthase.’ A solution (280 mg of protein in 7 mL) of malate syn- 
thase, from the penultimate step in the purification procedure 
from Saccharomyces cerevisiae, was loaded onto a column (1.8 L) of 
cross-linked dextran. The fractions (7 mL) collected from the 
bottom of the column were assayed for absorbance at 280 nm (0) 
and enzymatic activity (nanomoles minute! milliliter; A) and 
specific activity (0) was calculated by dividing enzymatic activity 
(milliliter)! by the absorbance at 280 nm. Reprinted with permis- 
sion from ref 149. Copyright 1981 Springer-Verlag. 
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Problem 1-8: Sugars such as lactose have negative 
values of preferential solvation. Unlike the preferential 
solvation of most salts, which are affected by activity 
coefficients, the preferential solvation of lactose is invari- 


ant with its concentration. The preferential solvation of 
bovine serum albumin by lactose® is -0.35 mL g`". What 
molar concentration of lactose should have an effect on 
the solubility of bovine serum albumin equal to a 1M 
concentration of Na,SO,? What is the percent saturation 
of a1 M solution of (NH4)2SO,? Why isn’t lactose used to 
precipitate protein? 


Problem 1-9: Calculate the number of theoretical plates 
in the column used for the separation displayed in Figure 
1-5A from the width of the peak of phosphoglycerate 
mutase. Use the number of theoretical plates to calculate 
the width the peak of glyceraldehyde-3-phosphate dehy- 
drogenase should have. Why might its peak be wider than 
the width calculated? 


Problem 1-10: Calculate the number of theoretical 
plates in the column used in Figure 1-6C. 


Problem 1-11: The table below describes the purifica- 
tion of glutamyl-tRNA reductase. Calculate, in the proper 
units, the total enzymatic activity, the yield, the total pro- 
tein, the specific activity, and the cumulative enrichment 
at each step. 


Problem 1-12: Alprenolol (Al) binds tightly and specifi- 
cally to B-adrenergic receptor (BAR), which is a protein in 
the plasma membranes of certain animal cells. The dis- 
sociation constant for this binding is the equilibrium 
constant defined by the equation 


OCHS 


[AL BAR] 


where all concentrations are in moles (liter). They are 
the concentration of free alprenolol, [Al], the concentra- 
tion of uncomplexed f-adrenergic receptor, [BAR], and 
the concentration of the complex between the alprenolol 
and ß-adrenergic receptor, [Al- BAR]. The value for Kg is 
8 nM. 


Table for Problem 1-11!" 
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Alprenolol was covalently attached to agarose to 
produce an affinity adsorbent for the purification of 
B-adrenergic receptor. The final concentration of the 
alprenolol covalently bound to the solid phase, [Alg] tor, 
was 2 mM in units of millimoles (liter of bed) '. All molar 
concentrations designated with primes are in moles (liter 
of bed) '. Assume that the dissociation constant between 
covalently bound alprenolol and ß-adrenergic receptor is 
the same as that for unbound alprenolol (8 nM). 

Consider what happens when a solution containing 
B-adrenergic receptor is added to a chromatographic 
column containing the affinity adsorbent. If, as is rea- 
sonable, [BAR]’ << [Alg]to7, where [Alg]tor is the molar 
concentration of covalently bound alprenolol (2 mM), 
then [Alg]tor = [Alg]’, the molar concentration of cova- 
lently attached alprenolol to which ß-adrenergic recep- 
tor is not bound; and from the equation for Ka 


[Alg] tor ` 
Ka [BAR] 


(Ale BAR] Sy 5 


where o is the partition coefficient for ß-adrenergic 
receptor between the mobile phase, BAR, and its com- 
plex with alprenolol covalently bound to the stationary 
phase, Ale. BAR. 


(A) If the chromatographic column has a volume of 
mobile phase, Vo, of 2.0 mL, calculate the elution 
volume, V, gar, of B-adrenergic receptor. 


One way to decrease the elution volume of ß-adrenergic 
receptor would be to add free alprenolol to the mobile 
phase at a particular molar concentration (All in moles 
(liter of bed)". Again, if [BAR]’ << [Alg]to7, then 


[Al,-BAR]’ 
[BAR]’ + [Alm BAR] 


d ~ 
gar = 


(B) Derive an equation for oar in terms of [AlylTor, 
[Alg]tor, and Ka, if 


purification step volume of final pool (mL) 


enzymatic activity (umol min™ mL") 


protein concentration (mg mL» 


clarified homogenate 2,270 
DEAE-cellulose 720 
phosphocellulose“ 300 
phenyl-agarose? 110 
blue-agarose“ 7 
methylsulfonated polyether“ 1 
Superose molecular exclusion 2 


ND’ 4.6 
2.1 4.0 
1.7 1.0 
2.9 0.20 

19 0.45 
25 0.10 
5 0.005 


“Figure 1-3.” Agarose (Figure 1-7) to which phenyl groups are attached through ether linkage. ° Agarose to which Cibacron blue has been covalently attached. “Beaded 
hydrophilic polyether resin to which methylsulfonate groups are covalently attached. “Interfering enzymatic activities prohibited assay. 
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_ [Aly]’ [BAR]’ 
" [AAR] 


(C) Calculate the elution volume of ß-adrenergic 
receptor from the same chromatographic column 
(Vo = 2 mL) if the concentration of alprenolol in 
the mobile phase, [Aly] tor, is 0.10 mM. 


Molecular Charge 


Before electrophoresis can be understood, the property 
of a molecule of protein that permits its electrophoresis 
to occur, namely, its molecular charge, must be under- 
stood. The mean net molecular charge number* of a 
molecule of protein i, Z; [mean number of elementary 
charges (molecule of protein)'], is the difference 
between the number of its many positive elementary 
charges and the number of its many negative elementary 
charges averaged over time. These individual elementary 
charges are those of adsorbed ions from the solution; 
those of any tightly bound coenzymes, inorganic anions, 
and metallic cations; and those of the covalent post- 
translational modifications of the protein as well as the 
more obvious positive elementary charges of the guani- 
dinium, ammonium, and imidazolium cations and the 
negative elementary charges of the carboxylates, thio- 
lates, and phenolates that are the side chains of the 
amino acids incorporated into the covalent molecular 
structure of the protein itself. 

Each of the charged side chains of the amino acids 
is the conjugate acid or conjugate base of a weak neutral 
base or weak neutral acid, respectively (Table 2-2), and 
the degree to which each is ionized is a function of the pH 
of the solution. The mean net proton charge number of 
protein i, Zu is the difference, averaged over time at a 
particular pH, between the number of all the positive ele- 
mentary charges and the number of all the negative ele- 
mentary charges on a molecule of that protein that arise 
from ions or functional groups that remain affixed to the 
protein, covalently or noncovalently, in pure water in the 


* The charge number of an ion, a molecule, or a functional group 
should be distinguished from its charge. The charge number is the 
number of elementary charges borne by the ion, the molecule, or 
the functional group. The charge is the number of coulombs borne 
by the ion, the molecule, or the functional group. The charge on an 
ion, a molecule, or a functional group is the elementary charge 
(1.602 x 10°’ C) multiplied by its charge number. Chemists are 
accustomed to refer to the number of elementary charges borne by 
an ion, a molecule, or a functional group, its charge number, as its 
“charge”. This habit involves no misunderstanding so long as the 
actual charge on the ion, the molecule, or the functional group is 
never involved in the discussion. Unfortunately, the property that 
determines its electrophoretic mobility is the charge on the mole- 
cule, not its charge number, so in this instance the distinction 
between charge number and charge must be clear. 


absence of any other dissolved electrolytes in the solu- 
tion. Because the rates of protonic equilibria are 
extremely rapid, all molecules of a protein i, even though 
each has a different number of elementary charges at any 
instant, will have the same mean net proton charge 
number, Zu, if the fluctuations of charge are averaged 
over a time as long as a second. 

The change in the mean net proton charge 
number on a protein as a function of pH is measured by 
performing a simple acid-base titration (Figure 
1-11).°°!>! The number of moles of protons or hydrox- 
ide ions necessary to adjust an unbuffered solution con- 
taining a known molar concentration of proteini to a 
given final pH from a given initial pH is measured. The 
number of moles of protons or hydroxide ions neces- 
sary to adjust an identical unbuffered solution, lacking 
the protein, to the same final pH from the same initial 
pH is then measured. The solution containing the pro- 
tein will always consume more moles of protons or 
hydroxide ions than the control, and this additional 
amount can be converted into the equivalents of posi- 
tive charge gained by the protein upon association of 
the excess protons or the equivalents of positive charge 
lost upon dissociation of the protons and their combi- 
nation with the excess hydroxide ions, respectively, as 
the pH of the solution is changed from the initial value 
to the final value. In order to anchor this titration curve 
at some absolute number of net elementary charges on 
the protein rather than equivalents of elementary 
charge relative to those on the protein at some arbitrary 
initial pH, three distinct properties of a protein, which 
are often confused with each other, must be understood 
and clearly distinguished. These properties are the 
isoionic point, the point of zero net proton charge, and 
the isoelectric point. 

The isoionic point, PHigoionic,, Of protein i is the pH 
of a solution containing only water and protein i includ- 
ing all of its tightly bound ions, coenzymes, and post- 
translational modifications.’ A solution containing only 
water and proteini, an isoionic solution, is usually 
obtained by passing a solution of protein i over a mixed- 
bed ion-exchange medium to remove all salts, and the 
pH of the resulting solution is then measured.'*! Since 
the only cations and anions in such a solution, other than 
the ions bound tightly or covalently to the protein, are 
protons and hydroxide ions 


(Zeta) [protein i] + [H* ]isoionic = [OH J isoionic 
(1-58) 


where [protein i] is the molar concentration of the pro- 
tein, Zu isoionic,i is the mean net proton charge number on 
the protein at its isoionic point, and [H’lisoionic and 
[OH Jisoionic are the molar concentrations of protons and 
hydroxide ions in this isoionic solution. This equation 
can be combined with the expression for the ionization 


of water (Ky = [H*][OH ]) and the definition of pH ([H*] = 
10 TI to give 


-2 pHisoionic,i 
es 10 isoionic, i 


Steed $ = 


(1-59) 
[protein i] 107 PHisoionic,i 


Equation 1-59 can be used to calculate the mean net 
proton charge number on the protein į at its isoionic 
point, and this provides a measurement of the absolute 
mean net proton charge number on the protein i at one 
pH in the absence of electrolytes. It is this direct meas- 
urement of the mean number of charges on the protein 
at a given pH in the absence of electrolyte that is usually 
used to anchor the titration curve of a protein (Figure 
1-11). 

The point of zero net proton charge is the pH at 
which the mean net proton charge number on protein i 
is zero. The isoionic point, PHisoionic,» iS formally distin- 
guished from the point of zero net proton charge because 
at the isoionic point the protein does bear a mean net 
proton charge. It is clear from Equation 1-58, however, 
that if [protein 7] is significant and pH;sojonic,; is between 
pH 5 and 9, there is little difference between the isoionic 
point and the point of zero net proton charge. This is not 
the case, however, for acidic or basic proteins. 
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Figure 1-11: Net mean proton charge number on ribonuclease as 
a function of pH. Solutions of ribonuclease at ionic strengths 
0.01 M (@), 0.03 M (9), and 0.15 M (O), produced with KCl, were 
titrated with either KOH or HCLD) The changes in pH as a function 
of the equivalents of acid or base added (mole of protein)! were 
recorded. The isoionic point was determined by passing a solution 
of the protein over a mixed-bed medium for ion exchange to 
remove all electrolytes except the protein, H", and OH’. The point 
of zero net proton charge was then calculated with Equation 1-59. 
The absolute mean net proton charge number, Zurnase is pre- 
sented as a function of pH. Reprinted with permission from ref 151. 
Copyright 1956 American Chemical Society. 
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It is its point of zero net proton charge that is rou- 
tinely estimated from the sequence of a protein. If it is 
assumed that the protein bears no unknown tightly 
bound ions or coenzymes and has no unknown post- 
translational modifications and if it is assumed that each 
side chain of each type of amino acid has its ideal, unper- 
turbed value of pK, (Table 2-2), then it is possible to esti- 
mate the point of zero net proton charge of the protein 
from its composition of amino acids and any known 
tightly bound ions, coenzymes, and posttranslational 
modifications (Problem 1-15). Such estimates of points 
of zero net proton charge are commonly performed by 
simple algorithms available at data banks on the internet. 
Such calculations are usually rather inaccurate estimates 
of the actual points of zero net proton charge because the 
values for the pK, of the amino acids are seldom the same 
in the native protein as their ideal, unperturbed values, 
which are accurate estimates only when the amino acid 
isin an unfolded polypeptide and does not have an imme- 
diate neighbor with an ionized side chain. For example, 
Glutamate 89 of ß-lactoglobulin is buried within the pro- 
tein at low pH and does not titrate with the rest of the glu- 
tamates but becomes exposed during a change that 
occurs in the structure of the protein above pH 7 and 
titrates as the structural change progresses.” In addition, 
there often are unknown tightly bound ions or post- 
translational modifications. Finally, the point of zero net 
proton charge is often between pH 6 and 8, where small 
shifts in the titration curve lead to large changes in the 
point of zero net proton charge (Figure 1-11). The result 
of one of these algorithmic estimates of the point of zero 
net proton charge is usually referred to, erroneously, as 
the isoelectric point of the protein. 

The isoelectric point of protein i, Di, is the pH at 
which, under a given set of conditions, the mean net 
molecular charge number of protein i, Z, is zero.” The 
mean net proton charge number on protein i, Zy, differs 
from the mean net molecular charge number on pro- 
tein i, Z, because proteins have a tendency to bind 
weakly the ions of electrolytes in the solution, even ones 
as simple as halides’** and alkali metal ions.'”* This bind- 
ing occurs even at the point of zero net proton charge 
and is reflected as a decrease or increase in pH;soionic,i AS a 
neutral salt is added to an isoionic solution.'” For exam- 
ple, if protein i in an isoionic solution binds more of the 
anions than the cations of a neutral salt that has been 
added, the increase in its negative charge will indirectly 
cause it to take up more protons, increasing PHigoionici- 
The reverse effect on the isoionic point is observed when 
the cations are preferentially bound. 

This binding of small simple ions, such as halides 
and alkali metal cations, to proteins results from chela- 
tion. Two or more fixed charges or dipoles on the protein, 
of opposite sign to the bound ion, have to be properly ori- 
ented to perform such chelation. Consequently, the 
number of each type of ion bound at the isoionic point is 
a unique and unpredictable property of each protein. In 


34 Purification 


deoxyhemoglobin, a site at which chloride binds to the 
protein has been identified, and it sits between two func- 
tional groups, an ammonium cation of the amino termi- 
nus and a guanidinium cation of an arginine, that both 
bear a positive charge and chelate the chloride.'” In tryp- 
tophanase, a site at which potassium ion binds to the pro- 
tein is formed from the oxygen of a carboxylate and three 
acyl oxygens from the backbone of the polypeptide that 
together chelate the ion.'” In plasminogen activator 
inhibitor 1, a site at which a chloride ion binds is sur- 
rounded by two ammonium cations of two lysines and 
two NH groups of two amides from the backbone of the 
polypeptide that all chelate the ion.'”” In exotoxin A from 
Pseudomonas aeruginosa, a site at which a chloride ion 
binds is formed from two guanidinium cations of two 
arginines, and a site at which a sodium ion binds is formed 
by two acyl oxygens from the polypeptide backbone.'”® 

Although there is no relation between the number of 
ions bound and the charge on the protein at a particular 
pH, proteins with high densities of negative charge seem 
to bind cations more readily than those with low densities 
of negative charge." This tendency presumbably results 
from the increase in the probability of proper juxtaposi- 
tion for chelation with the increase in the density of neg- 
ative charge. As the pH is lowered from the point of zero 
net proton charge, the density of positive charge on a pro- 
tein increases only marginally; rather, the density of neg- 
ative charge decreases as carboxylates are neutralized. It 
has been observed that the number of bound anions 
increases as the pH is lowered,’ which results from the 
decrease in electrostatic repulsion, due to these carboxy- 
lates, that at neutral pH inhibits the chelation of dissolved 
anions by the fixed positive charges on the protein. For 
reasons that are not well understood but may include the 
differences in ionic radii, proteins seem to bind halides 
more readily than they do alkali metal ions. 

The mean net molecular charge number, Z;, on pro- 
tein iin a solution containing simple neutral salts such as 
(NH,).SO,, NaCl, or KCl is the sum of the mean net 
proton charge number and the net charge number con- 
tributed by these loosely bound ions: 


(1-60) 


where vu is the mean number of ions of species j and 
charge number z bound by the protein. It is this net 
charge on protein i that determines its behavior on chro- 
matography by ion exchange or electrophoresis. In turn, 
electrophoresis is the usual method for determining the 
isoelectric point of a protein. 


Suggested Reading 


Tanford, C., & Wagner, M.L. (1954) Hydrogen ion equilibria of 
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Problem 1-13: The commercially available anion- 
exchange medium DEAE-Bio-Gel A is a beaded polymer 
formed from the naturally occurring polysaccharide 
agarose (Figure 1-7) to which are attached 2-(diethyl- 
amino)ethyl groups (Figure 1-2). In a column poured 
with DEAE-Bio-Gel A, the concentration of covalently 
attached tertiary ammonium cations is 20 mmol (L of 
bed)’. The bed of such an ion-exchange resin can be 
divided theoretically into two compartments that can be 
referred to as the stationary compartment and the 
mobile compartment. The stationary compartment, 
which is the volume within the beads, surrounds the 
covalently attached tertiary ammonium cations and 
includes enough of the surrounding volume that the 
compartment is electroneutral. The mobile compart- 
ment, which is the volume surrounding the beads, is the 
remainder of the volume that is accessible to the protein 
being submitted to the chromatography. 

An isoionic solution of bovine serum albumin at 
50 mg mL" has a pH of 5.48. This solution is adjusted to 
the desired pH with KOH to produce a potassium salt of 
bovine serum albumin, K„BSA. Samples of this polyanionic 
form of bovine serum albumin are submitted to chro- 
matography by anion exchange on a column 4.5 cm in 
diameter and 40 cm in length of DEAE-Bio-Gel A in the 
chloride form. The solution within the DEAE-Bio-Gel A 
itself has been adjusted with HCl to the same pH as that of 
the solution of protein and equilibrated with an unbuffered 
solution of KCI. No buffer has to be used because the bovine 
serum albumin and the diethyl aminoethyl groups on the 
agarose provide adequate buffering. 

The movement of the bovine serum albumin 
through the chromatographic system will be determined 
by its partition coefficient between the stationary com- 
partment and the mobile compartment 


[BSA] 


Og SS 
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where the superscript n- refers to the mean net molec- 
ular charge number on the bovine serum albumin at 
the chosen pH, and as in Equation 1-1, the primes on 
the concentrations indicate that they are in units of 
moles (liter of bed)™!. The free energies of transfer of 
the ions between the stationary compartment and the 
mobile compartment, however, are governed by the 
actual molar concentrations of the bovine serum albu- 
min in the two compartments (indicated by the 
unprimed brackets as usual) according to the partition 
coefficient 


[BSA]; 
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(A) Show that 
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where E is the fraction of the total accessible 
volume of the bed, Vy, that is the volume of the 
stationary compartment, Vs. Note that by defini- 
tion the sum of V; and the volume of the mobile 
compartment, Vy, is Vr. 


r 
Opsı = ass | 


The ideal distribution of bovine serum between the 
mobile and stationary compartments in the DEAE-Bio- 
Gel A is governed by equations equivalent to Equations 
1-15 to 1-18 that describe the conservation of charge in 
the two compartments and the equivalence of the ideal 
activities of the various dissolved salts in the system. 


(B) Write four equations equivalent to Equations 1-15 
to 1-18 for the special case of bovine serum albu- 
min on DEAE-Bio-Gel A. Use the explicit abbrevi- 
ations Ki, Cl, BSA”, and DEAE*. Remember that 
for the potassium salt of a multivalent anion, K,,A, 
where the charge number on anion A is n-, the 
ideal activity of the salt in a solution is 


AKA = Kalb Ee 


In this equation, does not have to be an integer. 


(C) Unlike the derivation in the book, assume that 
only the concentration of bovine serum albumin, 
not [K*]s, is negligible and show that 


J [DEAE*],2 + 4[K*],2 - [DEAE*]s 
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and that 
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where [DEAE] is the concentration of covalently 
attached tertiary ammonium cations: 20 mmol 
(L of bed)’. 


(D) The titration curve of bovine serum albumin?” is 
such that the value of the partial derivative 
(Zussa/dpH)r.. has a constant value over the 
region from pH 5.5 to 7.0 of-5.9. 


Assume that, under the conditions of the experiment, 
bovine serum albumin does not bind either K* or CI. 
What is the mean net molecular charge number on the 
bovine serum albumin at pH 6.00, and what is the mean 
net molecular charge number on the bovine serum albu- 
min at pH 7.00? 


(E) Before solutions containing the potassium salts of 
bovine serum albumin are run, a sample of the 
isoionic solution of bovine serum albumin at 
50 mg ml. is adjusted to pH 5.0 with HCl and run 
on the column of DEAE-Bio-Gel A equilibrated at 
pH 5.0 and eluted with 0.04 M KCL The elution 
volume of the bovine serum albumin in this run is 
537 mL. What parameter of the ion-exchange 
column is measured by this experiment? 


(F) Asample of the isoionic solution of bovine serum 
albumin at 50 mg mL” is adjusted to pH 6.00 with 
KOH and run on the column of DEAE-Bio-Gel A 
equilibrated at pH 6.00 and eluted with 85 mM 
KCl. The elution volume of the bovine serum 
albumin on this run is 3.38 L. What is the value of 
Goen under these conditions? 


(G) Show that the value of fg for the DEAE-Bio-Gel A 
in the column is 0.060. 


(H) Asample of the isoionic solution of bovine serum 
albumin at 50 mg ml. is adjusted to pH 7.00 with 
KOH and run on the column of DEAE-Bio-Gel A 
equilibrated at pH 7.00. What concentration of 
KCl must be used to have the elution volume of 
the bovine serum albumin be 3.00 L? 


() Explain which of the assumptions, either implicit 
or explicit, relied upon in the preceding develop- 
ment are most certainly oversimplifications and 
explain why each of them is an oversimplification. 


Problem 1-14: At a protein concentration of 3 x 10° M, 
the isoionic pH of ribonuclease’! is 9.60. Calculate 
ZH isoionic,RNase- 
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Problem 1-15: Assume that the side chains of the acidic 
and basic amino acids in a native properly folded protein 
all have the same values for their acid dissociation con- 
stants that they do in the unfolded polypeptide (Table 
2-2). Let f be the fraction of a particular acidic or basic 
amino acid that is ionized at a given pH. 


(A) Show that for a particular type of amino acid the 
conjugate base of which is anionic, such as aspar- 
tate, glutamate, cysteine, tyrosine, or a carboxy 
terminus, 


1 


Janionic = Ts nee) 


where the pK, is the one found in Table 2-2. Show 
that for a particular type of amino acid the conju- 
gate acid of which is cationic, such as histidine, 
lysine, arginine, or an amino terminus, 


1 
1 + 10(PH-PKa) 


f cationic — 


where the pK, is the one found in Table 2-2 for 
that amino acid. 


In a molecule of fructose-bisphosphate aldolase 
from rabbit skeletal muscle, there are four identical 
polypeptides, each containing one amino terminus, 14 
aspartates, 24 glutamates, 11 histidines, eight cysteines, 
12 tyrosines, 26 lysines, 15 arginines, and one carboxy 
terminus. There are no bound coenzymes or posttransla- 
tional modifications. The pK, of a carboxy terminus is 
3.3, and that of an amino terminus is 8.0. 


(B) Calculate the mean net proton charge number on 
a molecule of fructose-bisphosphate aldolase at 
pH 8 and at pH 9. (If you are adept at using a com- 
puter, go to part E first). 


(C) Estimate the point of zero net proton charge for 
fructose-bisphosphate aldolase. 


(D) What is the value of the point of zero net proton 
charge for fructose-bisphosphate aldolase 
according to the experiments in Figure 1-16? 


(E) Write a program or program a spreadsheet to cal- 
culate the mean net proton charge number on 
fructose-bisphophate aldolase at any pH. 


(F) Use the program to draw a titration curve of fruc- 
tose-bisphosphate aldolase. 


Problem 1-16: A particular protein is modified within 
the cells where it is normally located by the covalent 
attachment of inorganic phosphate in the form of phos- 
phate esters. Anywhere between zero and seven phos- 
phates can be attached to the protein under normal 
circumstances. The isoelectric points of these eight dif- 


ferent forms of the protein are 4.32, 4.29, 4.26, 4.23, 4.20, 
4.17, 4.14, and 4.11, respectively. At these values of pH, 
each phosphate ester would have a charge number of 
-1.00 so an additional equivalent of negative charge is 
added to the protein when an additional phosphate is 
added. 


(A) Explain why the isoelectric point of the protein 
decreases as each phosphate is added. 


(B) What amino acid side chains are titrating in this 
range of pH? (See Table 2-2). 


(C) Assume that the aspartates and glutamates of the 
protein have the same pK, (4.2). The decrease in 
the mean net proton charge number on a protein 
as the pH is lowered, if only the glutamates and 
aspartates are titrating, should be 


1 1 
1 + 10@2-pH) | + 19(4-2-PHi) 
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where men is the total number of glutamates plus 
aspartates in the protein, pH; is the final pH, and 
pH; is the initial pH. What is the total number of 
glutamates plus aspartates in the protein? 


Electrophoresis 


When a molecule of protein i at a given pH in an aqueous 
solution of electrolytes is placed in an electric field, it will 
experience a force, Fa, in the direction x such that 


where Q; is the mean charge on protein; (coulombs), e, is 
the elementary charge (1.602 x 107” C), Z is the mean net 
molecular charge number of the molecule of protein i 
under these circumstances, and E, is the electrical field 
(volts centimeter!) or gradient of the electrical potential 
(0V/dx) in the x direction. The units of force (grams cen- 
timeter second?) follow from the fact that one volt is 
one joule coulomb” (10° gram centimeter’ second” 
coulomb ’). Electrophoresis is usually run in an appara- 
tus designed so that (dV/dy) and (dV/dz) are zero, and the 
force Fa will cause the molecule of protein i to move only 
in the x direction. 

For the moment, it will be assumed that only the 
molecule of protein i and its physically bound ions move. 
As the molecule of protein i moves, a frictional force, Frio 
exerted by the surrounding stationary liquid is experi- 
enced by the molecule. The frictional force is propor- 
tional to the velocity of movement of the molecule 


(1-62) 


where the constant of proportionality, fj, is the frictional 
coefficient (grams second ') of the molecule of protein, 
one of its physical properties. 

At this point a digression is necessary to explain the 
frictional coefficient before continuing with a discussion 
of electrophoresis. The most direct way to determine the 
frictional coefficient of a molecule of protein is from its 
diffusion coefficient, D. The diffusion coefficient is a 
measure of the net tendency of any population of identi- 
cal molecules to spread from a region of high concentra- 
tion to a region of low concentration; the driving force 
behind this movement is not a function of any intrinsic 
feature of the individual molecules such as their charge 
number or their mass. The diffusion coefficient D; (cen- 
timeters’ second’) of any substance i in solution is 
defined by Fick’s law 


OC; 
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where J, ; is the flux (moles centimeter’ second ') of sub- 
stance i through a planar surface of unit area, c; is the 
concentration (moles centimeter’) of the substance i at 
any point, and x is the distance (centimeters) along an 
axis normal to the planar surface. The greater (dc;/0x) , 
the greater the diffusive force, and the greater the net 
flux. The diffusion coefficient of substance i, D; is 
simply the constant of this proportionality. It can be 
shown that 


(1-63) 
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f= D (1-64) 


where k; is Boltzmann’s constant (1.38 x 107" g cm? s* 
kK) and Tis the temperature (kelvins). 

The diffusion coefficient of a protein is most unam- 
biguously measured by creating a sharp boundary 
between two solutions, one of which contains the protein 
at a given initial concentration and the other of which is 
otherwise identical to the first but does not contain the 
protein (Figure 1-12).'° At any time after initiating the 
experiment, (dc;/dx), where x is normal to the original 
boundary, will be a Gaussian function. The width of this 
function will increase with time as diffusion spreads the 


boundary, and 
2 
p= -H3 
Ant \H 


where A is the area (concentration) of the curve of 
(dc;/dx), against x and H is its maximum height (concen- 
tration centimeter !). At the present time, however, the 
diffusion coefficients of proteins are usually measured by 
dynamic light scattering!°"'® or by pulsed field gradient 
nuclear magnetic resonance.'°*'™ 


(1-65) 
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Figure 1-12: Measurement of a diffusion coefficient.’® (A) 
Spreading of a boundary of concentration at the interface formed 
between two solutions, one containing the solute and the other not 
containing the solute. A solution containing the solute is brought in 
contact with a solution otherwise identical, but lacking the solute, 
to form an interface at the origin of the horizontal axis. At the ini- 
tial time the function of the concentration (c) is discontinuous at 
the interface at the origin of the horizontal axis, but as time pro- 
gresses (t, and t) the solute diffuses in the direction x normal to the 
interface into the vacant solution and a gradient of concentration 
develops. (B) The first derivative of the function of concentration 
with respect to distance in the direction x [(dc/dx),] at any instant is 
a Gaussian function (curves labelled 1, and t»), the width of which 
increases and the height of which decreases with time, t. Reprinted 
with permission from ref 160. Copyright 1961 John Wiley. 


The frictional coefficients of spheres or ellipsoids of 
revolution can be calculated. For a sphere 
f = 6anr (1-66) 
where 7 is the viscosity (pascal seconds, where a pascal is 
a kilogram second” meter") of the solution and r is the 
radius (centimeters) of the sphere. This equation has led 
to the formalism of the effective sphere or Stokes’ sphere 
representing protein i, the radius of which, a,, is defined 
as 


(1-67) 
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This radius, a;, is simply the radius of a sphere the diffu- 
sion coefficient of which is the same as the diffusion 
coefficient of protein i. It is usually referred to as the 
Stokes’ radius of the protein. 

It is now possible to return from the digression 
defining the frictional coefficient to the molecule of pro- 
tein in the electric field. When the electric field is turned 
on, a steady state’ is rapidly reached in which F = -Fiic 
and which is characterized by a constant terminal veloc- 
ity (0x;/d1)z of the molecules of protein i in the direction 
of the electric field. At steady state, because Fa = -Fric 


xe (1-68) 
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Although electrophoresis is usually carried out in an 
apparatus in which the current passes through a compli- 
cated path, the region of the apparatus over which the 
proteins are actually separated, the electrophoretic 
field, is uniform in its dimensions and in its specific con- 
ductance so that E, is constant over its length. The free 
electrophoretic mobility, u°; (centimeters? volt! 
second ') of protein iis defined as 


,  Beiä, ai 


u; = = 


(av /ax) vt 


(1-69) 


where d; is the distance (centimeters) moved by protein i 
in time t (seconds) when a particular voltage Vis applied 
to an electrophoretic field of length /. This definition 
causes the electrophoretic mobility to be only a function 
of the molecule of protein and the medium through 
which it is moving. 

It follows that, if the assumptions that have been 
made were correct, the relationship governing elec- 
trophoresis would be 


i= (1-70) 


This relationship, however, is an incomplete description 
of electrophoresis and fails to explain actual behavior.'® 
Equation 1-70 states that electrophoretic mobility will be 
affected by ionic strength only insofar as Z; is affected by 
ionic strength. In general, Z; increases gradually but not 
impressively with ionic strength, as ionic shielding per- 
mits the molecule of protein i to bear a greater net charge 
(Figure 1-11), yet it is observed that electrophoretic 
mobility declines precipitously as ionic strength is 
increased (Figure 1-19)" 

The inadequacy of Equation 1-70 is due to the erro- 
neous assumption that the only participant responding 
to the electric field is the molecule of protein and its 
directly bound ions. This would be true if the molecule of 
protein were dissolved in pure water with no added elec- 
trolyte. In fact, the value for the extrapolation of the 
experimental curve in Figure 1-13 to zero ionic strength 
does seem to agree with that calculated from Equation 
1-70 (the upper line in the graph). An actual solution of 
protein, however, must at the very least contain coun- 
terions to balance its charge, and in order to perform the 
electrophoresis, additional electrolyte must be added to 
the solution as well. 

When a charged particle such as a molecule of 
protein is dissolved in a solution of water containing 
electrolyte, the solution surrounding the molecule of 
protein has a net charge of opposite sign due to the 
existence of an ionic double layer.“ In the present 
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Figure 1-13: Free electrophoretic mobility (U°oyan) of the protein 
ovalbumin at pH 7.1 as a function of the square root of the ionic 
strength of the solution, LI The upper curve is the behavior of 
the ideal electrophoretic mobility calculated with Equation 1-70; 
the points (x) are the observed mobilities. The line through the 
points is the behavior of the electrophoretic mobility calculated 
with Equation 1-79. Reprinted with permission from ref 165. 
Copyright 1940 Royal Society of Chemistry. 


case, one layer of the ionic double layer is the layer of 
fixed charges on or near the surface of a molecule of 
the protein that produces its mean net charge, and the 
other layer is the layer of solution surrounding that 
molecule of the protein. This outer layer of the double 
layer is enriched in counterions opposite in sign to the 
net charge on the protein and depleted in co-ions of 
like sign. A region of solution large enough to contain 
the entire ionic double layer must be electrically neu- 
tral, and consequently, the outer layer of the ionic 
double layer must have a net charge, which is the dif- 
ference between the charge carried by the counterions 
and the charge carried by the co-ions, equal in magni- 
tude but opposite in sign to that of the protein. The dis- 
tribution of that charge within the outer layer of the 
ionic double layer is a function of the ionic strength of 
the solution. 

If a molecule of protein i were the sphere of its 
Stokes radius a; (centimeters), if that sphere had a uni- 
form density of elementary charges on its surface pro- 
ducing a mean net molecular charge number, Z, and if 
that sphere were dissolved in a solution of monovalent 
electrolyte the positive and negative ions of which were 
both spherical and both had the same radius oa, then the 


radial distribution of electrostatic potential, g(r), in volts, 
through the outer layer of the ionic double layer would be 
approximated by 


e, | Ziexp [« (a; + aj- r)] 


ET 1+ x(a; + a;) 


[P+ (a; +a;) = 
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where r is the distance (centimeters) from the center of 
the sphere, £, is the relative permittivity* of the solvent 
(dimensionless), and e, is the elementary charge. The 
term (a; + a,) takes account of the fact that an ion cannot 
approach the sphere of charge closer than its finite radius 
a; permits. 

By Coulomb’s law 


1 coulomb? = 10° c’ gram centimeter = (1-72) 
8.928 x 10'8 gram centimeter’ second? ‘\— 
where c is the speed of light in a vacuum. Con- 
sequently, 


coulomb 


a = 8.298x 10!! V 
centimeter 


(1-73) 


The parameter x (centimeters "1 in Equation 1-71 is 
defined by the relationship 
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where zi the charge number of the ion of species j com- 
posing the electrolyte and n; is the number of ions j for 
each cubic centimeter of the solution. The units on e, 
can be directly converted (Equation 1-72) from 
coulombs? to gram centimeter? second’. 

The term Enz’ is related to the ionic strength, [, 
(moles liter’), defined as 


m 


2 l 2 
EA (1-75) 
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where [J] is the molar concentration (moles Dier" Ti of the 
ion of species j, by the relationship 


m 
> nz? = 24N (1-76) 
j=l 


* The relative permittivity or dielectric constant of a substance is 
its permittivity relative to the permittivity of the vacuum. 
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where N, is Avogadro’s number, and 
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The term within the brackets on the right side of 
Equation 1-71 can be considered to be the effective 
charge number of the sphere, and this effective charge 
number is a function of the distance r from its center. If 
there were no electrolytes in the solution so that «= 0, 
the potential would decrease radially only as the 
inverse of the distance, r, as expected for a sphere of 
charge in a medium of uniform relative permittivity, ¢,, 
and the full charge number, Z, would contribute to the 
potential at all values of r. If « does not equal zero, 
however, the effective charge number decreases as r 
increases due to the presence of the outer layer of the 
ionic double layer. Because the term « has the dimen- 
sions of centimeters”, its inverse, «', is used as a 
measure of the thickness of the double layer.” 
Equations 1-71 and 1-77 define the dimensions of the 
double layer and state that the thickness of the double 
layer will decrease as ionic strength is increased. As 
the thickness of the ionic double layer decreases, the 
layer of counterions tightens around the molecule of 
protein. 

Consequently, there are two distributions of 
charge that respond to the electric field, the one on the 
molecule of protein and its bound ions with a total 
charge number of Z; and the one within the outer layer 
of the ionic double layer the distribution of whose 
charge is defined by Equation 1-71 and the total charge 
number of which is —Z;. The protein is drawn in one 
direction by the electric field; the outer layer of double 
layer, in the other. The effect of this electrostatic force 
on the outer layer of solution surrounding the protein 
applied in a direction opposite to that on the molecule 
of protein (Equation 1-70) causes the outer layer to 
move in a direction opposite to that in which the pro- 
tein is caused to move. The consequence of this retro- 
grade movement is to increase the drag of the solution 
on each molecule of the protein, and this effect can be 
described in terms of an increase in the effective fric- 
tional coefficient of each molecule.‘ As the outer layer 
moves one way and the molecule of protein, the reason 
for the existence of the double layer in the first place, 
moves in the other, the outer layer continuously dis- 
solves behind the molecule of protein and re-forms 
around it in front so that the movement of the protein is 
continuously impeded. Because the thickness of the 
outer layer of the double layer decreases as the ionic 
strength of the solution increases, its velocity in the 
direction opposite to that of the protein increases as the 
ionic strength increases. The result is an increase in its 
drag on the molecule of protein, and thereby a decrease 
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in the electrophoretic mobility of the protein as the 
ionic strength increases.* 

On the basis of these assumptions, an equation has 
been derived!®'"1® to describe the electrophoretic 
mobility of protein i if its shape is approximately that of 
a sphere: 


e Z; l+ Ka; 
u; = i f (xa;) 
Íi + Ka; + KG; 


(1-78) 


where fixka), Henry’s function, is a function in «a; for 
which there is no exact expression! but which can be 
expressed graphically (Figure 1-14).'® The value of this 
function varies between 1.0 and 1.5. It can be seen that 
when xa;<1, as is usually the case for a solution of pro- 
tein, 
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Figure 1-14: Graphic presentation of Henry’s function.'® 
Reprinted with permission from ref 160. Copyright 1961 John 
Wiley. 


* There is another way to describe the effect of the double layer on 
electrophoresis. An assumption is made that there exists a discrete 
surface of shear that defines a boundary between the stationary 
solution and both the moving molecule of protein and the solution 
moving with it. The charge within the surface of shear, Q, which 
includes both the charge of protein i, e,Z, and the sum of the 
charges of the enclosed mobile ions, ez, creates a potential at that 
surface. The potential at the surface of shear is the zeta potential, € 
If this assumption were a realistic one, then 


uc = DC 
' Aaen 


The problem is that the relationship between Z; and ¢ is a complex 
one. It is possible to calculate ¢ for a molecule of protein from its 
electrophoretic mobility, but ¢ provides little information about 
that molecule of protein because it includes the potential resulting 
from both the mobile ions and the molecule of protein itself. 


e,Z; 1 


l + ka; 


al xa;) (1-79) 


This equation predicts that the electrophoretic mobility 
will decrease as the ionic strength increases (Figure 1-13) 
because « increases as the ionic strength increases 
(Equation 1-77). 

The points in Figure 1-13 are the observed elec- 
trophoretic mobilities of the protein ovalbumin at vari- 
ous ionic strengths as measured by Tiselius and 
Svensson.’ The top line is their calculation of the mobil- 
ities with Equation 1-70 by use of independent measure- 
ments of Zoyalbumin and fovalbumin- The lower line is their 
calculation of the mobilities with Equation 1-79. The 
agreement between calculated values and observed 
values is surprisingly satisfactory. As the authors point 
out, the calculated value from Equation 1-70, in the 
absence of electrolyte, comes close to the extrapolated 
value ofthe actual mobilities. 

According to Equation 1-78, at a constant ionic 
strength, the electrophoretic mobility of protein i should 
be directly proportional to Z, and this proportionality is 
reflected in the direct proportionality that obtains between 
Za, and u? (Figure 1-15)” as Z,, ;is varied by varying the 
pH at a constant ionic strength.!” In fact, it is possible to 
display the titration curve of a protein visually by submit- 
ting a sample to electrophoresis on a two-dimensional 
electrophoretic field prepared so that there is a linear gra- 
dient of pH across the field perpendicular to the direction 
of electrophoresis.'”' The absolute values of the elec- 
trophoretic mobilities of several proteins have been cal- 
culated from experimental values of their mean net proton 
charge numbers, Zu by use of Equation 1-78 with the 
assumption that Zus Z; or a more complicated equation 
derived from a cylindrical model rather than a spherical 
one. The agreement between calculated values of u; and 
experimental values of u; was within a factor of 2 or less 

The lack of exact agreement between calculated 
and experimental values was assumed to be due to the 
difference between Zu: and Z; caused by the binding of 
inorganic ions to the proteins. In this case, the propor- 
tionality between Zu: and u; observed in Figure 1-15 
could still be explained, if the binding of counterions 
increases proportionately as Z;; increases in magni- 
tude.’ It has been demonstrated, however, by systemat- 
ically varying the charge number on human carbonate 
dehydratase II that electrophoretic mobility is not 
directly proportional to charge number when charge 
number becomes large.'” This deviation from the behav- 
ior predicted by Equation 1-79 at high levels of charge 
number could be accounted for by using the nonlinear 
Poisson-Boltzmann equation rather than the linear form 
used in the derivation of Equation 1-79 and by using a 
correction for ion relaxation and polarization. The latter 
correction accounts for the local electric field arising 
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Figure 1-15: Comparison of the electrophoretic mobilities of 
trypsin (centimeter volt! second!) at 0 °C (U°trypsins Points) with 
the acid-base titration curve of trypsin determined at 20 °C 
(Zitrypsin continuous curve). The respective scales on the two 
vertical axes, those for electrophoretic mobility and mean net 
proton charge number, respectively, both with respect to pH, were 
adjusted to produce maximum coincidence. The value for Zu teypsin 
= 0 was arbitrarily set to coincide with the isoelectric point. The 
coincidence displayed is in shape rather than absolute value or 
excursion. The different symbols denote the different buffers used 
to maintain the pH during electrophoresis: (x) Na*, H, CI; (A) Na’, 
H+, acetate”, CI’; (O) Ce", H*, barbiturate”, CT: (O) Mg”, H*, barbi- 
turate”, Cl’; (V) Ca", H*, glycinate”, CI’; (0) Ce", H*, NH3, CT. The 
ionic strength was maintained at 0.13 M. Reprinted with permis- 
sion from ref 170. Copyright 1952 Academic Press. 


from the distortion of the outer layer of the double layer 
caused by its movement in a direction opposite to that of 
the protein and its inability to dissolve behind it and re- 
form around it instantaneously. 

At its isoelectric point, Di, the electrophoretic 
mobility of protein i becomes zero (Equation 1-78), and 
this fact permits the isoelectric point of a protein to be 
measured by electrophoresis." Electrophoretic mobili- 
ties are measured at values of pH greater than and less 
than p/, and the pH of zero mobility is determined by 
interpolation (Figure 1-15). 

The effect of ionic strength on the isoelectric point 
of a protein in the absence of actual binding of the ions 
in the electrolyte to the protein has been calculated to be 
smaller than the experimental error in measurement.'”° 
Nevertheless, significant variations in isoelectric point 
with ionic strength are generally observed (Figure 
1-16),'* and these depend on the particular neutral salt 
chosen to adjust the ionic strength. The explanation for 
this behavior can only be the preferential binding of par- 
ticular ions—in Figure 1-16, always that of the anions— 
in the chosen electrolyte. The net binding of ions can be 
calculated from the observed changes in the isoelectric 
point because, from Equation 1-60, when Z;=0 


(1-80) 
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Figure 1-16: Variations in the electrophoretic isoelectric points 
(pI) of a protein as a function of the square root of the ionic 
strength Co” Line A, ovalbumin in acetate; line B, fructose-bis- 
phosphate aldolase in phosphate; line C, fructose-bisphosphate 
aldolase in acetate; line D, carboxyhemoglobin in phosphate. 
Reprinted with permission from ref 174. Copyright 1949 American 
Chemical Society. 


and Z4; is available from titration data (Figure 1-11). 

To this point, only the free electrophoretic mobility 
of a protein, u;°, has been discussed. The free elec- 
trophoretic mobility is the electrophoretic mobility dis- 
played by a protein in free solution. This property of the 
protein is measured by moving boundary electrophore- 
sis!” in an apparatus developed by Tiselius.'”° This tech- 
nique has been supplanted by electrophoresis in 
continuous gels of cross-linked polyacrylamide. A gel of 
cross-linked polyacrylamide is a hydrated plastic cast in 
a mold from a solution of acrylamide and the cross-linker 
N,N’-methylenebis(acrylamide) along with a buffer and 
other salts. The total concentration of acrylamide and 
N,N’-methylenebis(acrylamide) in the final gel can be 
varied from 3% to 20%. 

It has been demonstrated experimentally by 
Morris!” that the relative electrophoretic mobilities of 
proteins in polyacrylamide gels vary regularly with the 
concentration of acrylamide used to cast the gel (Figure 
1-17)'” and 


u; = u; exp (-K;i T4) 


1 


(1-81) 


where u; is the electrophoretic mobility of protein i 
observed on a gel cast from a solution whose total con- 
centration of acrylamide, in percent, was T, and K,;is a 
retardation coefficient unique to protein i. Such behav- 
ior was first noted by Ferguson!”® on gels cast from starch 
in which the same equation applies (Equation 1-81), but 
the concentration is T, the concentration of the 
starch.’ 

According to Equation 1-81, u;° should be the free 
electrophoretic mobility of protein i, and this has been 
shown to be the case.’ It follows that 
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Figure 1-17: Electrophoretic mobility, presented on a logarithmic 
scale, of various proteins on gels of various concentrations of poly- 
acrylamide (T,).'‘’ The gels were cast from solutions of pH 8.88 with 
total concentrations of acrylamide plus N,N’-methylenebis(acryl- 
amide) (T,) varying between 5% and 15%. The concentration of 
N,N’-methylenebis(acrylamide) was always 20-fold less than the 
concentration of acrylamide. The proteins were ß-lactoglobulin 
(Lac), ovalbumin (Ova), ovomucoid (Ovm), pepsin (Pep), bovine 
serum albumin monomer (BSA,) and dimer (BSA), myoglobin 
(Myo), and immunoglobulin G (IgG). Reprinted with permission 
from ref 177. Copyright 1966 Elsevier. 
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| exp (-K,,T,) 


Examination of this relationship reveals that the elec- 
trophoretic mobilities of the proteins in a complex mix- 
ture upon a gel of polyacrylamide are directly 
proportional to their respective charges, which are deter- 
mined by complex functions of pH (Figure 1-11); are 
complex functions of their respective frictional coeffi- 
cients, which are determined by their sizes and shapes; 
and are exponentially proportional to the product of a 
constant, which is unique for each, and the concentra- 
tion of acrylamide. At a given pH, ionic strength, and 
concentration of polyacrylamide, each of the proteins in 
this mixture will have a characteristic electrophoretic 
mobility (Figure 1-17) and they can be separated one 
from the other. In this way, electrophoresis can provide a 
catalogue of the number of proteins present in the mix- 
ture and the relative amounts of each. 


Electrophoresis of native proteins* is the most reli- 
able method available for assessing the homogeneity of a 
sample of purified protein. A sample of pure protein in its 
native state should display only one component upon gel 
electrophoresis. Because the electrophoretic mobilities of 
two proteins change disproportionately as either the pH 
(Figure 1-15) or the concentration of acrylamide (Figure 
1-17) is changed, the possibility that the single component 
observed under one set of conditions results from the acci- 
dental coelectrophoresis of two or more proteins can be 
dismissed by running electrophoresis at several values of 
pH!” or several concentrations of acrylamide.” 

If they are to be used in the roles of cataloguing mix- 
tures and establishing purity, ®' electrophoretic separa- 
tions of native proteins on polyacrylamide gels must 
have as high a resolution as possible. This high resolution 
is achieved by using a discontinuous buffer system and 
performing what has been referred to as disc elec- 
trophoresis,'?”'® the pun apparently intended. This 
technique relies upon the creation of three stable 
moving boundaries (Figure 1-18).'** Each of the three is 
formed between two solutions of different ionic compo- 
sition. It is the applied electric field that causes the 
boundaries to move. 
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Figure 1-18: Disc electrophoresis.'® At the start (left) the proteins 
in the original sample (black rectangle) are in alarge volume and at 
alow pH (pl), They are compressed to a small volume, or disc, as 
they move through the stacking gel by being trapped in the stable 
boundary between the upper solution (pHy, buffer) and the solu- 
tion of the original sample and the spacer (pH; ,). (Middle) Upon 
fusion of this descending boundary and the stable ascending 
boundary between the solution originally in the running gel (pH) 
and the solution originally in the stacking gel (pH ,), the pH at the 
boundary increases and the new more rapidly moving boundary 
outstrips the proteins and deposits a newly created solution of 
higher pH (pHyew) behind it as it moves ahead of the separating 
proteins (right). The proteins also escape the first boundary 
because, at about the same time as the jump in pH at the fusion of 
the descending and ascending boundaries, they encounter the run- 
ning gel, which has a higher percentage of acrylamide and which 
decreases their mobility. Reprinted with permission from ref 182. 
Copyright 1964 New York Academy of Sciences. 


* The electrophoresis of proteins unfolded in solutions of dodecyl 
sulfate is quite different and will be discussed later. 


The first of these boundaries is used to trap the pro- 
teins and sweep them into an extremely narrow band 
prior to the electrophoretic separation. This process has 
been called stacking. It significantly improves the reso- 
lution of the subsequent separation by shrinking the 
original sample to a hairline so that all of the molecules 
of protein begin the electrophoretic separation at nearly 
the same point. The stacking occurs because the proteins 
are initially placed as a sample that is sandwiched 
between an upper solution and a lower solution. The 
upper solution is simply poured on top of the sample, but 
the lower solution is in a polyacrylamide gel, so that con- 
vective turbulence does not disrupt the stable moving 
boundaries, but it is a gel of a low concentration of poly- 
acrylamide, so that the mobilities of the proteins are as 
high as possible. This gel of high porosity is the stacking 
gel. The solution in which the protein is dissolved has the 
same composition as the lower solution. 

The upper and lower solutions are prepared so their 
respective ionic compositions will form a stable moving 
boundary of a particular type. Although systems for 
cationic proteins are also available, to describe this 
boundary, it will be assumed that the direction of elec- 
trophoretic movement of both the proteins and this first 
stable moving boundary is downward and a pH has been 
chosen such that the proteins are all anionic. In this case, 
both the upper solution and the lower solution above 
and below the boundary, respectively, are prepared from 
salts of the same cationic weak acid [for example, 
tris(hydroxymethyl)methylammonium ion]. An anion 
(for example, glycinate ion) the mobility of which is less 
than the mobilities of all the proteins is used to make the 
upper solution, and an anion (for example, chloride ion) 
the mobility of which is greater than the mobilities of all 
the proteins is used to make the lower solution. The 
stable descending boundary formed is one between 
these two anions. If a molecule of one of the proteins 
finds itself in the lower solution, it is surrounded by 
anions that are moving faster than it is, and it is over- 
taken by the boundary. If a molecule of one of the pro- 
teins finds itself in the upper solution, it is surrounded by 
anions that are moving more slowly than it is, and it out- 
strips them and returns to the boundary. The result of 
these events is that the proteins all gather within the 
descending boundary itself, which remains extremely 
sharp and stable if the upper and lower solutions have 
the proper ionic compositions." 

The stacking process is able to compress the pro- 
teins to thin lamella, but in order for electrophoretic sep- 
aration to occur, they must be released from the 
boundary after they have been stacked. This can be done 
if the upper solution of this initial descending boundary 
has been made with an anion, a, that is slower than the 
protein only because it is the conjugate anionic base (for 
example, glycinate ion) of a weak neutral acid (glycine, 
pKa = 9.6) and the pH of the upper solution has been 
chosen to be significantly lower than the pK, of that weak 
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acid. Under these conditions, the anion ais slow because 
only a fraction of the weak acid is anionic at any instant. 
The acid-base equilibrium has the effect of decreasing 
the mobility of the anion o from its value in the absence 
of its conjugate acid to a lower value, and 

Ug = We fa (1-83) 
where ugis the mobility of the upper anion at the actual 
ratio of conjugate base to acid in the solution, u@ is its 
mobility in the absence of its conjugate acid, and f, is the 
fraction of the total weak acid that is ionized at the ratio 
chosen. In such a situation, the proteins can be released 
from the descending boundary by abruptly increasing 
the pH, and hence the value of fọ, so that u, becomes 
greater than the mobilities of the proteins, and the new 
stable, but now rapidly descending, boundary that 
results drops the proteins behind at the origin of the elec- 
trophoretic separation. 

The abrupt increase in the pH of the upper solution 
of the descending boundary can be accomplished by the 
arrival of a stable ascending boundary between two con- 
centrations of the same cationic buffer. Behind this 
ascending boundary is a solution of the same cationic 
weak acid as used for the upper and lower solutions of the 
initial descending boundary [for example, tris(hydroxy- 
methyl) methylammonium ion] but at a higher concen- 
tration and a higher pH than that of the solution behind 
the initial descending boundary. This ascending bound- 
ary between different concentrations of the same cationic 
buffer has been constructed so that the anion in both its 
upper and lower solutions (for example, chloride) is the 
same. Because its upper solution is by definition the lower 
solution of the initial descending boundary, this anion is 
already the fast anion of that boundary. The cationic weak 
acid of the upper solution of the ascending boundary is 
by definition the cationic weak acid in the two solutions 
used to make the initial descending boundary. The con- 
centration and pH of the cationic buffer in the lower solu- 
tion of the ascending boundary, however, is chosen to be 
high enough to adjust the final pH behind the new 
descending boundary to a value high enough to release 
the proteins from the initial descending boundary. If the 
release is unsuccessful, or only partially successful, the 
proteins, or some of the proteins, remain trapped in the 
new descending boundary and are never separated. 
These trapped unseparated proteins form an extremely 
sharp but uninformative and deceptive band at the 
bottom of the final electrophoretogram.'® 

The release of the proteins from the initial descend- 
ing boundary in which they were trapped and stacked can 
be accomplished even more effectively by using a stable 
ascending boundary behind which is a solution of a 
cationic weak acid of a higher pK, than the cationic weak 
acid used as the counterion in the initial descending 
boundary.' This ascending boundary has been con- 
structed so that the anion in both its upper and lower solu- 
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tions is the same. Because its upper solution is the lower 
solution of the initial descending boundary, this anion is 
the fast anion of the upper boundary (for example, chlo- 
ride ion). The cationic weak acid of the upper solution of 
the ascending boundary must be the cationic weak acid 
of the two solutions used to make the initial descending 
boundary (for example, pyridinium ion; pK, = 5.14). The 
cation of the lower solution of the ascending boundary is 
chosen to be the cationic conjugate acid [for example, 
tris(hydroxymethyl) methylammonium ion; pK, = 8.10] of 
aneutral base strong enough to adjust the final pH behind 
the new descending boundary to a value higher than the 
pK, of the neutral conjugate acid of the anion o (for exam- 
ple, 4-morpholineethanesulfonate ion; pK, = 6.15) and 
release the proteins from the initial descending bound- 
ary. The difficulty with this strategy is that the pH of the 
upper solution of the initial descending boundary is often 
so low that the proteins are no longer anionic and move 
upward instead of downward. But it is effective with com- 
plexes of protein and dodecyl sulfate because they are 
anionic at all reasonable values of pH. 

To ensure that as many proteins as possible are 
released from the initial descending boundary, shortly 
after the fusion of the ascending boundary and the initial 
descending boundary, the descending band of proteins 
in the stacking gel encounters a much higher concentra- 
tion of polyacrylamide, the running gel, which decreases 
the mobilities of all of the proteins by virtue of the rela- 
tionship in Equation 1-81. This frictional deceleration of 
the proteins increases the probability that all of their 
mobilities will be less than that of the now accelerated 
anion of the upper solution so that they can escape from 
the new descending boundary. 

The polyacrylamide gel is poured in two stages 
(Figure 1-18): the running gel, the polyacrylamide con- 
centration of which is high and upon which the separa- 
tion will occur, and the stacking gel, the polyacrylamide 
concentration of which is as low as possible to keep the 
mobilities of the proteins as high as possible and in 
which the stacking will occur. 

Three stable moving boundaries must be con- 
structed (Figure 1-18). At the start of the electrophoresis, 
the initial descending boundary between the slow anion 
and the fast anion that will compress the proteins is the 
boundary between the upper electrode solution (pHy) 
and the solution in the sample and the stacking gel 
Ip), At the start of the electrophoresis, the ascending 
boundary between the two concentrations of the 
cationic conjugate acid of the weak base or between the 
cationic conjugate acids of the weaker base and the 
stronger base that will deliver the pH jump is the bound- 
ary between the solutions in the running gel (pH;,) and 
the stacking gel (pH ,). The third stable moving bound- 
ary is the new descending boundary that deposits behind 
it the solution in which the proteins are actually sepa- 
rated (pHyew)- It forms upon the fusion of the other two. 

As the initial descending boundary moves through 


the stacking gel, it must maintain a constant pH and 
ionic strength behind it (pHy) to maintain the low and 
constant mobility of the slow anion in the upper solu- 
tion. As the new descending boundary moves, it must 
deposit behind itself a solution of constant pH and ionic 
composition (pel to form a uniform electrophoretic 
field upon which the proteins can be separated. The pH 
and ionic composition of the solution that is deposited 
behind the new descending boundary is different from 
the pH and ionic composition of any of the solutions ini- 
tially present, but the cation in this newly created solu- 
tion is the weak cationic acid of the original lower phase 
of the ascending boundary and the anion in this solution 
is the now accelerated anion of the original upper phase 
of the initial descending boundary. The constant pH 
deposited behind this new descending boundary is 
established by the weak cationic acid found on both 
sides of the boundary and its conjugate base and the now 
accelerated slow anion found in the upper solution of the 
boundary, which is a weak anionic base, and its conju- 
gate acid. All four of these species together buffer the 
deposited solution and determine both the ionic 
strength and the value of the deposited pH and hence the 
pH of the actual electrophoresis. 

The equations that govern the creation of a stable 
moving boundary and the ability of that boundary to 
deposit a solution of uniform pH and ionic composition 
were derived by Ornstein’ from the regulating func- 
tions described by Kohlrausch.'”' On the basis of these 
equations, Jovin!® has developed a more elaborate theo- 
retical description of discontinuous electrophoresis, and 
he and his colleagues have provided the necessary 
recipes for a large number of discontinuous systems. 
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Problem 1-17: The uptake of protons by 1 mol of 
horse carboxyhemoglobin in the range of pH 6-8 is 
about 9 mol of protons for each drop of 1 unit in pH.'® 
Use this value to estimate the moles of phosphate 
bound by a mole of horse carboxyhemoglobin at its iso- 
electric point at the phosphate concentration of the last 
point in curve D of Figure 1-16 ([phosphate] = 0.12 M). 
Assume that no cations other than protons are binding 
to the protein under these conditions. 


Problem 1-18: Use interpolated values for the free 
electrophoretic mobility of ovalbumin (Figure 1-13) at 
ionic strengths of 0.0025, 0.01, and 0.16 M to calculate 
the charge number on the protein during the elec- 


trophoresis. The pH for the measurements was 7.1, and 
the temperature was 294K. The viscosity of water at 
294 K is 1.0 mPa s. The diffusion coefficient of ovalbu- 
min at 294 K is 4.2 x 10” cm? s™!. 


Problem 1-19: The isoelectric point of normal hemo- 
globin, hemoglobin A, is 6.87, and that of sickle hemo- 
globin, hemoglobin S, is 7.09 when electrophoresis is 
carried out under the same conditions.'” In the vicinity 
of the isoelectric point, the charge number on either of 
these hemoglobins changes by about 13 equiv for every 
mole of protein for every change of 1 unit in pH. At the 
same pH, anywhere between their two respective iso- 
electric points, what is the difference in charge number 
between hemoglobin A and hemoglobin S? 


Problem 1-20: The frictional coefficient of trypsin at 
10°C is 5.5 x 10° g s™. Assume the molecule to be a 
sphere and calculate its free electrophoretic mobility at 
10 °C and at pH 6 and J, = 0.13 M by using the results of 
the acid-base titration in Figure 1-15, which are for 
20 °C, and Equation 1-79. Assume that Zyypsin = Zi,trypsin 
and that Ait trypsin at pH 6 is the same at 10 °C as at 20 °C. 


Problem 1-21: The frictional coefficient of ribonuclease 
at 25 °C is 2.6 x 10° g s™. Assume the molecule to be a 
sphere and calculate its free electrophoretic mobility at 
pH 6 and [KCl] = 0.15 M by using the results presented in 
Figure 1-11 and Equation 1-79 with the assumption that 
ZpNase = Zi gyan, In a field of 20 V cm”, how far would 
ribonuclease travel in 3 h if it had this mobility? 


Problem 1-22: The mean net proton charge number on 
bovine serum albumin (BSA) at pH 8.0 and ionic strength 
0.15M is -17.°° The diffusion coefficient of bovine 
serum albumin at 20 °C is 6.0 x 10°” cm? s’. Its retarda- 
tion coefficient (E. pol on polyacrylamide gels is 
0.16 (%)'. The viscosity of water at 20 °C is 1.0002 mPa s. 


(A) Show that the ionic strength of a solution con- 
taining 0.10 M sodium phosphate at pH 8.0 is 
0.27 M if the three values of the pK, for phosphate 
are 2.12, 7.21, and 12.32. 


(B) Estimate the mobility (ugsa) of bovine serum albu- 
min on a 7.5% gel of polyacrylamide run at 20 °C 
in 0.10 M sodium phosphate at pH 8.0. Assume for 
now that Zu ps, equals Zgsq at pH 8.0. 


(C) How far would the bovine serum albumin move in 
2 h if the field across the gel was 50 V cm"? 


(D) Refer to Figure 1-16. On the basis of the behavior 
displayed in this figure, would you expect the 
bovine serum albumin to have a larger or a 
smaller mobility than you estimated? Why? 


Problem 1-23: The table below contains information 
about five imaginary proteins where a is the Stokes’ 
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radius, Z is the mean net charge number on the protein 
at pH 7, (0Z,;/dpH);, is the change in mean net proton 
charge number with pH, K, is the retardation coefficient 
for polyacrylamide, and u° is the free electrophoretic 
mobility for a temperature of 25 °C, an ionic strength of 
0.1 M, and a pH of 7.0. 


(A) Assume that, at constant ionic strength, 
(Zu ,/dpH),. is equivalent to (0Z;/dpH);, for each 
of the five proteins, and calculate the elec- 
trophoretic mobilities of these five proteins, at 
25 °C and an ionic strength of 0.1 M, under each 
of the following conditions: (1) pH 7.0 on 5% poly- 
acrylamide; (2) pH 7.0 on 10% polyacrylamide; 
(3) pH 5.0 on 5% polyacrylamide; (4) pH 5.0 on 
10% polyacrylamide. 


(B) What is the order of the migration of these five 
proteins under each of the four conditions? 


(C) What will happen to protein E at pH 7.0 that 
would not happen at pH 5.0 if a mixture of the 
proteins is run on vertical polyacrylamide gels 
with the cathode at the bottom and the anode at 
the top? 


(D) Assume that Z; does not change as ionic strength 
changes and calculate the mobilities of the five 
proteins at an ionic strength of 0.2 M at pH 5 and 
at 25°C on 5% polyacrylamide. How does the 
increase in ionic strength affect the mobilities? 


protein a Z ƏZ a K, uc 
(nm) PpHN (pH Kei cm? 
a Vs 
A 2.4 +1.4 0.2 0.045 1.7x 10° 
B 5.3 +9.8 -1.8 0.152 3.2x 10° 
C 4.9 +6.2 2.7 0.146 2.3x10° 
D 2.6 +0.9 -0.5 0.048 1.0x 10° 
E 3.4 -3.4 -2.3 0.073 -2.4 x 10° 
Criteria of Purity 


When the purification of a particular protein is moni- 
tored analytically by disc electrophoresis (Figure 
1-19),'”° the array of other proteins present at the early 
states of the purification is seen gradually to become less 
complex in the later stages as one component emerges 
from the background and becomes more prominent 
until it alone remains.'*°’*' To be certain that the single 
component observed at the last step of the purification is 
the only one present in the purified preparation, elec- 
trophoresis should be run at a variety of protein concen- 
trations in addition to a few different acrylamide 
concentrations and values of pH.'” At high concentra- 
tions of protein, minor impurities are most easily recog- 
nized, while at low concentrations, two closely running 
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Figure 1-19: Disc electrophoresis on gels of polyacrylamide of 
native proteins from successive steps in the purification of [acyl- 
carrier-protein] S-malonyltransferase from E. coli.'” Electro- 
phoresis was performed on polyacrylamide gels cast from 15% 
solutions of acrylamide in a discontinuous system of tris(hydroxy- 
methyl)methylamine and glycylglycine. The different gels repre- 
sent samples from successive steps in a complete purification of 
the enzyme, seen in its final purified state on gel F. The gels were 
stained for protein with Coomassie brilliant blue. Reprinted with 
permission from ref 190. Copyright 1973 Journal of Biological 
Chemistry. 


components can be resolved. Also, by running polyacry- 
lamide gels loaded with a series of protein concentra- 
tions, the number and relative amounts of any minor 
impurities can be quantified.'*’ The polyacrylamide gels 
should also be stained with two distinct dyes, for exam- 
ple Coomassie brilliant blue and silver oxide,'**'* 
because some proteins do not stain so strongly as others 
with a particular dye. 

The single component observed upon elec- 
trophoresis of a sample from the final step of the purifi- 
cation must be shown to be the protein actually 
responsible for the biological function being purified. 
Either the polyacrylamide gel is sliced and the assay is 
performed on each slice (Figure 1-20),°'978"1°° or the 
intact polyacrylamide gel is stained for enzymatic activ- 
ity (Figure 1-21).” The latter is accomplished by placing 
the gel in a solution that promotes the incorporation of 
radioactivity"? or that gives a fluorescent product or a 
colored product from the enzymatic reaction. For exam- 
ple, by adding lead acetate, the SeH, produced in a poly- 
acrylamide gel from the action of selenocysteine lyase 
can be made to form a yellow band where the enzyme is 
located.” The most widely used stain for enzymatic 
activity is based on the ability of NADH to reduce 
p-nitrotetrazolium blue to give a blue color. ®®® It is 
obvious that through coupled assays this reaction can be 
used to visualize a large array of different enzymatic 
activities. At times, the protein being purified is itself col- 
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Figure 1-20: Electrophoresis of purified porcine phosphomeval- 
onate kinase (20 ug) on a gel cast from a 10% solution of acryl- 
amide.” Following the electrophoresis, the cylindrical gel was 
divided in half longitudinally. One half was cut into slices laterally, 
and the slices were assayed individually for enzymatic activity (A). 
The other half was stained for protein and then scanned for the 
resulting absorbance (B). The inset in panel B is a photograph of 
the stained gel. Reprinted with permission from ref 51. Copyright 
1980 American Chemical Society. 


Figure 1-21: Staining a polyacrylamide gel for enzymatic activ- 
ity.” Two samples of purified isocitrate dehydrogenase (NADP”) 
from the final step on phenyl agarose (Table 1-2) were submitted 
to electrophoresis on separate lanes of a thin slab of polyacry- 
lamide. After the electrophoresis, the lanes were cut from the slab. 
One of the lanes (lower) was stained for protein with Coomassie 
brilliant blue. The other lane (upper) was placed in a solution of 
isocitrate and NADP*. The intrinsic fluorescence of the NADPH 
produced by the enzyme was observed by illuminating the gel with 
ultraviolet light. Reprinted with permission from ref 19. Copyright 
1992 Blackwell Publishing. 


ored, by virtue of a bound chromophore, such as the 
coenzyme Bj}, associated with D-lysine 5,6-aminomu- 
tase,” and the coelectrophoresis of the purified protein 
and that color can be observed directly. 

Several artifacts can produce misleading results on 
electrophoresis. For example, aggregation of individual 
molecules of the same protein can occur” during 


either the purification or the stacking process, and this 
produces an array of complexes, each with a different 
frictional coefficient and retardation coefficient. The 
neutral amides of glutamines and asparagines on the 
protein can hydrolyze randomly and in low yield during 
aharsh purification to produce anionic carboxylates, and 
this modification leads to variations in Z; that produce 
multiple components from the same protein. Because 
these or other similar modifications are integral 
processes, the components that result from them are 
usually evenly spaced upon the electrophoretogram,'” 
and the nature of the artifact can be recognized by this 
pattern. ?0',20%204 Each component, however, should be 
biologically active if the protein is pure "370 

Although the coelectrophoresis of the purified pro- 
tein and the biological activity is the most convincing cri- 
terion of purity, occasionally the electrophoresis itself 
destroys the activity.” For this reason, or simply for per- 
sonal satisfaction, other criteria of purity are often used. 
Immunoglobulins raised against the purified enzyme 
should behave on immunodiffusion and immunoelec- 
trophoresis as expected of immunoglobulins directed 
against a single antigen. It is also encouraging when these 
immunoglobulins are able to precipitate all of the protein 
and all of the biological activity’ but not essential, 
because some immunoglobulins are ineffective at 
immunopreeipitation. Activity and protein should comi- 
grate on chromatography (Figures 1-6 and 1-10)°° or 
cosediment upon gradients of sucrose.” Even more con- 
vincing is the observation that the single band of protein 
observed upon electrophoresis of samples from fractions 
collected from the final chromatographic step increases 
in intensity and then decreases in intensity in concert 
with the increase and decrease of enzymatic activity, 
respectively, across the peak.” 1" 

The grams of protein for every mole of binding site 
is between 15,000 and 100,000 g mol! for most proteins. 
The concentration of protein (milligrams milliliter’) and 
the concentration of binding sites (moles liter!) for a 
ligand, such as an agonist or antagonist, known to be 
specific for a desired protein, such as the respective 
receptor, can be determined on samples from the same 
solution. If the ratio of these two quantities lies within the 
expected range and if only one protein can be discerned 
on electrophoresis, these observations are taken to be 
convincing criteria of purity, especially if the value of 
grams mole” agrees with the measured molar mass of 
the protomer of the protein that has been purified. For 
example, purified histidinol-phosphate transaminase 
binds 1 mol of pyridoxal phosphate for every 37,000 g of 
protein,” purified methylmalonyl-CoA mutase contains 
1 mol of adenosylcobalamine for every 73,000 g of pro- 
tein,” and purified a-adrenergic receptor binds 1 mol 
of (*H]prazosin for every 69,000 g of protein.” 

Isoelectric focusing is a method for assessing 
purity that is based on electrophoresis. A gel of polyacryl- 
amide is cast from a solution containing a mixture of 
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polyelectrolytes known as ampholytes. The isoelectric 
points of the ampholytes in the mixture vary over a con- 
tinuous range of pH values. Upon application of an elec- 
tric field, this mixture forms a stable gradient of pH in the 
gel. Each protein migrates through this gradient until it 
reaches a pH equal to its isoelectric point where it can no 
longer move, and the proteins in a mixture are spread 
upon the field in order of their respective isoelectric 
points. It is a technique that is less flexible than disc elec- 
trophoresis because it separates molecules on the basis 
of only one property rather than three. It also seems to be 
more sensitive to minor heterogeneities of charge than is 
electrophoresis. Because, however, isoelectric focusing 
detects heterogeneity of charge more successfully than 
electrophoresis, it is an even more stringent test of the 
homogeneity of a protein.” The coisoelectrofocusing of 
protein and biological activity, 11213 is an additional 
criterion of purity independent from the observation of 
coelectrophoresis. Isoelectric focusing has been com- 
bined with electrophoresis to resolve complex mixtures 
of proteins in two dimensions.’ When the clarified 
homogenate produced from the cytoplasm of the bac- 
terium E. coli was submitted to such a procedure, more 
than 1000 different proteins were represented upon the 
field (Figure 1-22).’"* This display indicates the complex- 
ity of the mixture of proteins in a cell. From such a mix- 
ture, a single protein with a single biological activity is 
purified. 
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Heterogeneity 


Often heterogeneity in a preparation of a purified pro- 
tein, observed as several different proteins capable of 
being separated, is detected by electrophoresis or iso- 
electric focusing even though all of the various compo- 
nents are biologically active; often heterogeneity is 
discovered in later experiments. This heterogeneity may 
have a biological origin, for example, because of varying 
levels of glycosylation or phosphorylation, and the vari- 
ous forms of the protein producing this heterogeneity 
may coexist in the tissue prior to homogenization, but 
usually the heterogeneity arises during the purification 
itself. Such heterogeneity is produced by processes that 
are minimized by avoiding extremes of pH through the 
use of well-buffered solutions, by working at low tem- 
peratures (0-5 °C), and by performing the purification in 
as short a period of time as possible. 

That it is the purification itself producing the het- 
erogeneity often becomes apparent when a new, more 
rapid, less debilitating method of purification is devised 
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Figure 1-22: Separation of proteins from the cytoplasm of the bacterium E. coli by electrophoresis in two dimensions.” A sample (10 ug of 
protein) from a homogenate of the bacteria, grown in the presence of ['*C]Jamino acids, was submitted to isoelectric focusing (pH 3-10), under 
conditions where the proteins were unfolded (9 M urea), on a cylindrical (0.25 cm x 13 cm) gel of polyacrylamide. After the unfolded proteins 
had reached their respective isoelectric points, the gel was removed from its tube, soaked in a solution of sodium dodecyl sulfate (SDS) to 
coat the unfolded polypeptides with this anionic detergent, and the cylinder was laid across the top of a flat slab (14 cm x 16 cm x 0.3 mm). 
The unfolded polypeptides separated by isoelectric focusing (IF) in the first dimension were then separated by electrophoresis (SDS) in the 
second dimension. [‘C]Polypeptides were located by placing the slab on photographic film and exposing the film for a long enough time that 
the radioactive disintegrations in each spot of protein produced the dark spots seen in the figure. Reprinted with permission from ref 214. 
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for a certain protein, and the heterogeneity noted previ- 
ously, the subject of many publications, simply disap- 
pears. When fructose-bisphosphatase was purified by a 
shorter method,?" the previously studied requirement of 
the enzyme for alkaline conditions was no longer mani- 
fest. When aconitate hydratase was purified by a more 
rapid procedure,” it was isolated with its iron still 
attached. When glyceraldehyde-3-phosphate dehydro- 
genase from yeast was purified rapidly by affinity chro- 
matography,”” the heterogeneous behavior in its 
binding of ligands”? was no longer observed. 

One of the most publicized causes of heterogeneity 
or artifactual alteration of a protein during its purifica- 
tion is digestion by peptidases.'*°””' Proteins the bio- 
logical role of which is to degrade other proteins are 


known as peptidases. With the exception of a few pepti- 
dases that are located in the cytoplasm such as the cal- 
pains, which can be inactivated by chelating any free 
calcium, most of the peptidases capable of degrading 
the normal, native proteins in a cell are present in inac- 
tive forms or are segregated from the cytoplasm of the 
cell in which they are located or in which they were pro- 
duced. This segregation is accomplished by enclosing 
the peptidases in tight, membrane-sealed packages, the 
lysosomes, or excreting them into the extracellular sur- 
roundings. Upon homogenization, the natural bound- 
aries between the cytoplasm and the cellular 
compartments containing these peptidases are 
destroyed, and artifactual digestion of the proteins 
being purified can commence. 


Peptidases are not always a problem. Most native 
proteins are remarkably resistant to digestion by pepti- 
dases, and in most instances, proteins can be purified 
without being digested. Harsh treatments, however, 
such as heat, the use of detergents, and extremes of pH 
encourage digestion by peptidases, and proteins purified 
by procedures employing these conditions often display 
evidence of deterioration. Because a protein can be 
nicked by a peptidase and remain almost unaltered in its 
functional and physical properties, the cumulative 
effects of digestion by peptidases become most obvious 
when proteins are unfolded and assessed by elec- 
trophoresis in solutions of sodium dodecyl sulfate,'”° a 
technique that is used to catalog the number and lengths 
of the polypeptides present in a given preparation. 

There are four major classes of peptidases, 
and their properties determine the precautions that can 
be taken to inhibit them during a purification procedure. 
Acid peptidases are active only at acidic ranges of pH, 
and if the purification is carried out at neutral or slightly 
alkaline pH, their action can be avoided. Sulfhydryl 
peptidases contain a thiol necessary for activity. If there 
is a suspicion that sulfhydryl peptidases are responsible 
for the heterogeneity or loss of activity that is observed, 
they can be permanently inactivated by treating the solu- 
tions of protein with iodoacetate, iodoacetamide, or 
N-ethylmaleimide before the purification is initiated and 
at one or two intermediate stages during the purification. 
Metallopeptidases require transition metal cations or 
alkaline earth cations and can be inactivated by adding 
chelating agents such as N,N,N’, N’-tetracarboxymethyl- 
1,2-diaminoethane or o-phenanthroline. Serine pepti- 
dases are invariably inactivated by diisopropyl 
fluorophosphate, but this compound is extremely toxic 
and dangerous to use. They are sometimes inactivated 
by phenylmethanesufonyl fluoride or by chloromethyl 
ketones of various specificities. As with the inhibitors 
of sulfhydryl peptidases, these reagents inactivate 
serine peptidases permanently so that solutions of pro- 
tein need only be treated prior to initiating the sequence 
of steps in the purification and at one or two intermedi- 
ate stages. Even when such precautions are taken, it is 
always wise to perform the purification in as short a time 
as possible. 

There is a vast array of natural and synthetic 
inhibitors of peptidases”? that are more or less spe- 
cific for one or several members of a particular class, and 
many of them are appropriate for preventing the action 
of unwanted peptidases during the purification of a pro- 
tein. Some have been used successfully as additives 
during the purification of proteins. For example, acetyl- 
CoA carboxylase has been purified from chicken liver in 
the presence of parotid trypsin inhibitor,” and phos- 
phoglycerate dehydrogenase, from the same source in 
the presence of leupeptin.””’ Often, however, inhibitors 
of the activity of peptidases are used prophylactically in 
the absence of any evidence that they are effective. 
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Crystallization 


As in the isolation of a natural product in organic chem- 
istry, the production of a crystalline preparation (Figure 
1-23)””® was once considered to be the final step in any 
isolation of a protein. Although the time-consuming 
search for the proper conditions necessary to crystallize 
a given protein has gone out of fashion, the exhilarating 
gallery of photographs of crystalline enzymes compiled 
by Dixon and Webb?” testifies to the pleasure that such 
a conclusion to a long purification must inspire. 

Crystallization as a method of purification is usually 
less effective than chromatography. Some examples of 
crystallization as the last step in a purification are the 
purification of 1.5-fold seen upon recrystallization of 
phosphoenolpyruvate carboxykinase (ATP),” the 
purification of 1.4-fold with a 40% yield seen upon 
recrystallization of acylphosphatase,”' and the 2-fold 
purification with a 90% yield seen upon recrystallization 
of nicotinate-nucleotide diphosphorylase (carboxylat- 
Ing) "7 Recrystallization has been observed to eliminate 
some of the heterogeneous behavior displayed by a puri- 
fied protein,” presumably due to an increase in its 
homogeneity. 

Crystals of a protein, aside from their intrinsic 
beauty, are the specimens required to determine the 
molecular structure of the purified protein by X-ray crys- 
tallography, and there is considerable interest in crystal- 
lizing proteins.“ For crystallographic studies, single, 
untwinned crystals of 0.1-1 mm in size are required, and 
to produce suitable crystals is a process involving a good 
deal of trial and error. Because of the number of attempts 
required by this trial and error and because suitable crys- 
tals only form in concentrated solutions of the protein, 
about 10 mg of protein is required. Furthermore, crystals 


Figure 1-23: Crystals of a-galactosidase isolated from Mortierella 
vinacea.””® Homogenates of cells of M. vinacea were submitted 
sequentially to ammonium sulfate fractionation, chromatography 
by anion exchange on (diethylaminoethyl)dextran, and chro- 
matography by molecular exclusion. The final protein was crystal- 
lized from a solution of ammonium sulfate. The largest crystal in 
the field is about 10 um across. Reprinted with permission from ref 
228. Copyright 1970 Journal of Biological Chemistry. 
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usually will grow most readily from homogeneous solu- 
tions of monodisperse protein, so the preparations used 
must be as pure as possible. Therefore, it is essential to 
examine the final protein from the purification by elec- 
trophoresis in its native state (not after it has been 
unfolded with dodecyl sulfate), by isoelectric focusing, 
and by dynamic light scattering before time is wasted 
trying to crystallize an inhomogeneous sample. The pro- 
tein also must have suffered as little damage as possible 
during the procedures used to purify it. For all of these 
reasons, proteins purified rapidly in one or two steps 
from overproducing microorganisms, in which more 
than 10% of the cellular protein can be the protein of 
interest, are usually the ideal starting material for crys- 
tallizations. The construction of such a microorganism is 
now usually the first step in an attempt to crystallize a 
protein for crystallographic studies. 

Crystals of proteins suitable for crystallography are 
produced by slowly and continuously increasing the 
concentration of both the protein and a solute that pro- 
motes crystallization. Any of the solutes, such as ammo- 
nium sulfate, poly(ethylene glycol), or trimethylamine 
oxide, that have negative preferential solvations and 
cause proteins to precipitate from solution can be used 
to promote crystallization. Choosing conditions of pH 
and ionic strength within ranges in which the second 
virial coefficient of the osmotic pressure of a solution of 
the protein is negative increases the odds of producing 
crystals.” A solution is prepared of the protein and the 
solute promoting the crystallization, both at concentra- 
tions slightly below those at which precipitation would 
begin. A drop of this solution (1 uL) is placed upon a glass 
cover slip. The cover slip is inverted over a well contain- 
ing a concentrated solution of some salt or other solute 
in which the activity of water is less than that in the solu- 
tion of the hanging drop. The system is sealed and left in 
the cold for several weeks. Slowly, water evaporates from 
the hanging drop and condenses in the well, and if one is 
lucky, crystals of protein form in the drop. As this is a rare 
event, hundreds of hanging drops are made, each with a 
different pH, ionic strength, or concentration of protein 
over wells with different solutions in them. Small mole- 
cules, for example, substrates or ligands, that are known 
to bind to the protein are also added to some of the drops 
in the hope that they might encourage crystallization. 

Once the crystals are obtained, a sample of them 
should be dissolved and the protein that they contain 
submitted to electrophoresis to be certain the it is the 
one desired rather than a contaminant in that prepara- 
tion that happened to crystallize while the desired pro- 
tein did not "P 
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Chapter 2 


Electronic Structure 


When proteins are submitted to chemical analysis, they 
are found to be composed of 20 amino acids: aspartic 
acid, asparagine, threonine, serine, glutamine, glutamic 
acid, proline, glycine, alanine, cysteine, valine, methion- 
ine, isoleucine, leucine, tyrosine, phenylalanine, lysine, 
histidine, tryptophan, and arginine. Each protein has dif- 
ferent relative amounts of each of these amino acids. The 
amino acids a protein contains are coupled together in a 
particular order to create polymers 50-5000 amino acids 
in length, referred to as polypeptides. To understand the 
structure of molecules of protein, one must understand 
the amino acids, the order in which they are connected, 
and the way that these long polymers are folded up to 
produce the native conformation of the molecule. The 
first level of understanding is grounded in a firm knowl- 
edge of the bonding and molecular structure of small 
molecules. The second level of understanding requires a 
description of the complete covalent structure of the 
polymers composing proteins. The third level of under- 
standing proceeds from crystallographic molecular 
models of proteins that are the products of X-ray crystal- 
lography. 

It is remarkable that each molecule of a particular 
protein, if it has not been heterogeneously posttransla- 
tionally modified, has the same covalent structure and 
that when it is in its natural environment, the polypep- 
tides from which it is composed assume the same few 
conformations even though the complete molecule of 
the protein is large. These two properties are foreign to a 
synthetic chemist. Molecules produced synthetically are 
either precise but small or large but heterogeneous. 
Large heterogeneous polymers produced synthetically 
seldom have defined structures. Yet a molecule of pro- 
tein is made from atoms held together by the same cova- 
lent chemical bonds holding together the smaller 
molecules to which one is already accustomed. All of the 
rules of bonding exerted with such inescapability in 
small molecules are as inescapable in a molecule of pro- 
tein. 

The covalent bonds holding the atoms together in 
any molecule are pairs of electrons confined to molecu- 
lar orbitals. The molecular orbitals are either localized 
o molecular orbitals or delocalized z molecular orbitals. 
A distinction between these two types of molecular 
orbitals is crucial to an understanding of bond lengths, 
bond angles, and rotational motions about bonds. 

In addition to the covalent bonds, molecules of pro- 


tein are filled with lone pairs of electrons. Because o lone 
pairs of electrons are the only valence electrons that do 
not participate in covalent bonds and because there are 
also lone pairs of electrons participating in x molecular 
orbitals, to understand the details of molecular structure 
one must be able to distinguish localized o lone pairs of 
electrons from delocalized z lone pairs of electrons. The 
distinction between these two types of electrons is 
reflected in their basicity, their ability to house a proton. 

Each lone pair of electrons in a molecule is a poten- 
tial base, and each hydrogen in a molecule is a potential 
acid. Which lone pair will act as a base is determined by 
the acid dissociation constant for its conjugate acid, and 
which hydrogen will act as an acid is determined by its 
own acid dissociation constant. Every lone pair is basic 
and every hydrogen is acidic, but most lone pairs are 
such weak bases and most hydrogens are such weak 
acids that their basicity or acidity can be ignored. To 
understand the atomic structure of a molecule of pro- 
tein, the significant acids and bases within it must be 
identified and categorized. It is also necessary to distin- 
guish an acid dissociation, in which a proton leaves the 
molecule, from a tautomerization, in which protons 
redistribute among lone pairs of electrons within the 
molecule. 

The chemical capacities available to a protein are a 
reflection of the amino acids from which it is con- 
structed. Each of the 20 amino acids has its own peculiar 
set of chemical capacities. These are mixed in a unique 
way by the amino acid sequence and the resulting native 
structure to produce those of the particular protein, but 
to understand the mixture, the properties of the ingredi- 
ents must be understood. These properties include the 
bonding and acid-base behavior of each of the 20 side 
chains of the amino acids. With the exception of the reg- 
ular polyamide backbone of the polymer, the covalent 
bonds, acidic hydrogens, and basic lone pairs of elec- 
trons that fill a molecule of protein are contributed by 
these side chains. 
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Molecules, including proteins, are arrays of atomic 
nuclei required to maintain particular distances and 
angular dispositions relative to each other by electrons 
confined to particular regions of space known as 
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orbitals. Every electron in a molecule is confined to a 
specific orbital, and almost every orbital is occupied by 
two electrons. Each orbital is either confined exclusively 
to one nucleus or distributed between or among particu- 
lar nuclei. The electrons, in their occupation of these 
orbitals, create the covalent structure of the molecule. 
The electrons present in a molecule can be divided into 
three categories, core electrons, z electrons, and o elec- 
trons, that reflect the degree to which they are confined 
and that define their chemical reactivity. 

Core electrons are the electrons that are immedi- 
ately adjacent to a nucleus. Aside from hydrogen, almost 
all of the atoms present in molecules of protein are either 
carbon, oxygen, or nitrogen. Each of these atoms has two 
core electrons spherically confined about the nucleus. 
Occasionally, sulfur or phosphorus occurs in a protein, 
and these atoms each have 10 core electrons. Because 
they are confined close to the nucleus, the core electrons 
provide the greatest electron density and are the promi- 
nent features in a map of electron density. They are, how- 
ever, chemically inert. 

Valence electrons are the outermost electrons sur- 
rounding each atom. All of the chemistry of a molecule, 
which is the consequence of its chemical bonds and its 
sites of reactivity, results from these valence electrons. 
Unless one electron is missing, as in the case of a radical, 
or two electrons are momentarily missing, as in a carbo- 
cation, every carbon, nitrogen, oxygen, sulfur, or phos- 
phorus in a molecule of protein can be formally 
associated with eight valence electrons. By convention, 
these octets are assigned by Lewis structures. This for- 
malism divides valence electrons into bonding electrons 
and lone pairs of electrons and assigns formal charge to 
certain atoms. An example would be the Lewis structure 
of the model compound for glutamic acid in a polypep- 
tide, N-acetylglutamate a-amide (Figure 2-1A). The 
intent of a Lewis structure is to count valence electrons. 

A pair of bonding electrons occupies a bonding 
molecular orbital that is formed from the overlap of two 
or more atomic orbitals, each contributed by a different 
atom in the molecule. These bonding electrons must be 
clearly distinguished as occupants of either z molecular 
orbitals, forming z bonds, or o molecular orbitals, form- 
ing o bonds. 

The overlap of two or more adjacent and parallel 
p atomic orbitals on two or more adjacent atoms in a 
molecule creates a system of m molecular orbitals. Two 
adjacent p atomic orbitals can overlap only above and 
below the line of centers between the two atoms from 
which they are contributed (Figure 2-2). This geometry 
has two consequences: it prevents rotation about axes 
connecting the nuclei of adjacent atoms and it permits a 
series of overlaps to occur simultaneously. Because rota- 
tion is prevented, structures containing a system of 
m molecular orbitals are rigid. The fact that a series of 
overlaps can occur permits the electrons occupying a 
system of z molecular orbitals to be delocalized. 
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Figure 2-1: Two ways of representing the electronic structure of 
N-acetylglutamate a-amide. (A) In the Lewis dot formula, each 
main atom is surrounded by an octet of electrons and the total 
number of electrons represented equals the sum of the number of 
valence electrons contributed by each neutral atom plus the ele- 
mentary molecular charge. The negative sign surrounded by a 
circle locates formal charge. (B) In a o-7 stereochemical represen- 
tation distinguishing types of electrons, a o bond is designated by 
a line, a localized o lone pair of electrons is designated by two dots 
surrounded by a circle, a z bond is indicated by a second or third 
line between two atoms, and a z lone pair of electrons is shown by 
two uncircled dots. The atoms are arranged in space to represent 
the tetrahedral or trigonal geometry dictated by their respective 
hybridizations. 


Delocalization of a pair of electrons occupying one 
mz molecular orbital in such a system results from the fact 
that each z molecular orbital is a linear combination of 
the p orbitals that overlap. Each x molecular orbital is 
spread over and shared by every atom that contributed a 
p orbital to the system unless a node is located at that 
atom. When a pair of electrons occupies a m molecular 
orbital, it cannot be assigned to a particular atom, 
notwithstanding the formal requirement of the Lewis dot 
structure that it be so localized for the purposes of book- 
keeping. Confusion between actuality and accounting 
sometimes leaves the impression that zelectrons are 
localized. 

An example of a combination of p atomic orbitals is 
the system of z molecular orbitals that forms when four 
parallel p orbitals mix (Figure 2-2). The number of 
zx molecular orbitals that result from any combination of 
this type is always equal to the number of p orbitals that 
have mixed; in this case there are four molecular 
orbitals in the system. Each p orbital can be mixed in one 
of two phases, and adjacent p orbitals can be either in 
phase, in which case they overlap—a favorable interac- 
tion—or out of phase, in which case a node—an unfa- 
vorable interaction—occurs between them. A node is a 
position at which the phase inverts. In a linear system 
such as the one shown in Figure 2-2, the number of 
nodes increases by one for each molecular orbital in the 
series. 

Each of these four z molecular orbitals in Figure 2-2 
has an energy level associated with it that is equal to the 
energy one electron would experience were it confined 
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Figure 2-2: A z molecular orbital system formed by the combina- 
tion of four parallel p atomic orbitals on four adjacent atoms held 
together by three obonds. The four p orbitals overlap, and four 
linear combinations of these four atomic orbitals, the four z mole- 
cular orbitals, are permitted. In the combination of lowest elec- 
tronic energy, all four porbitals overlap in phase (indicated 
arbitrarily with + and —). In each of the higher x molecular orbitals, 
nodes are present, on either side of which the constituent p orbitals 
have opposite phase so overlap is antibonding. The four individual 
a molecular orbitals are arranged in order of increasing electronic 
energy from bottom to top. In all linear z molecular orbital sys- 
tems, such as the one represented, the number of nodes increases 
by one upon moving to the next higher energy level. The nodes are 
evenly distributed from one end of the structure to the other. In this 
m molecular orbital system formed from four p atomic orbitals 
there are usually four electrons occupying the two m molecular 
orbitals of lowest energy. The sizes of the atomic orbitals approxi- 
mately represent the magnitude of their contribution to each 
z molecular orbital. 


within that orbital. In a amolecular orbital system 
formed entirely from p orbitals on atoms of the same 
element, such as that of the four carbon atoms of 
butadiene or that of the six carbon atoms in benzene, 
these energy levels are distributed symmetrically above 
and below the potential energy an electron would have in 
one of the isolated p orbitals from which the system has 
been created. The more nodes the molecular orbital has, 
the higher its energy and the less likely that an electron 
will occupy it. A «molecular orbital is designated as 
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bonding, nonbonding, or antibonding depending on 
whether its energy level is less than, equal to, or greater 
than the energy level of an isolated p orbital, respectively. 
If the z molecular orbital is bonding, the equivalent of a 
covalent bond is formed because a pair of electrons has 
a lower energy in the molecular orbital than it would 
have were it split between two isolated atoms. In 
z molecular orbital systems formed from three or more 
parallel p orbitals, that covalent bond is spread over the 
atoms contributing the p orbitals, notwithstanding the 
impression often left by the Lewis structure that it is the 
second localized bond in a double bond. 

The general structure of each z molecular orbital is 
determined only by the number of p orbitals that have 
mixed together and their connectivity; however, the 
nature of the atom—carbon, nitrogen, or oxygen—that 
has contributed each of the p orbitals does affect the 
shape and energy of the molecular orbital through 
coulomb effects. These effects are most easily under- 
stood as perturbations of the symmetric m molecular 
orbital system that would be formed from the same 
number and arrangement of carbon atoms by the fact 
that some of the atoms are of other elements. A coulomb 
effect is the distortion of the system of symmetric z mol- 
ecular orbitals that would exist if all of the atoms were 
carbons. It is caused by the electronegativity and elec- 
tron deficiency of the atom other than carbon that the 
mz molecular orbital system actually contains. A coulomb 
effect causes the region of a bonding or nonbonding 
mz molecular orbital over a more electronegative atom, 
such as oxygen, or a more electron-deficient atom, 
usually nitrogen, to swell at the expense of the region or 
regions over the less electronegative atoms, usually 
carbons. 

The number of zmolecular orbitals in a given 
system is determined solely by the number of p orbitals 
that have been mixed together, but the number of those 
molecular orbitals that are occupied by pairs of electrons 
is determined by other properties of the molecule as well. 
The two decisions, how many p orbitals have combined 
and how many z electrons have occupied the system of 
x molecular orbitals, are made by examining all valid res- 
onance structures for the molecule. Drawing resonance 
structures is nothing more than making this decision. 
Any electrons that are active participants in resonance 
have been explicitly designated as zelectrons by the 
person who drew those resonance structures, and any 
atom the bonding of which changes among the reso- 
nance structures has been explicitly designated as an 
atom that has contributed a p orbital to the system of 
m molecular orbitals. All double and triple bonds in a 
molecule are necessarily participants in systems of 
m molecular orbitals. 

The amide, which is of wide biochemical relevance, 
is asimple example of this process of designation (Figure 
2-3). The chemical properties of an amide are usually 
explained by drawing two resonance structures.’ These 
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two resonance structures state that each of the three 
atoms, the oxygen, the carbon, and the nitrogen, con- 
tributes a p orbital to the system of z molecular orbitals 
because their bonding changes between the two Lewis 
structures of the resonance pair. The resonance struc- 
tures state that the system of z molecular orbitals con- 
tains four zelectrons because two of the pairs of 
electrons shift between the two structures. When three 
adjacent p orbitals are mixed, three z molecular orbitals 
are created (Figure 2-3). That four electrons occupy 
these three molecular orbitals places one pair in each of 
the two molecular orbitals of lowest energy. If coulomb 
effects are disregarded for the moment, the two electrons 
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Figure 2-3: Electronic structure of an amide. In the upper left 
quadrant of the figure the two resonance structures for the amide 
are presented. The molecular orbital system in the center of the 
figure is formed from the linear combination of three p atomic 
orbitals: one from nitrogen, one from carbon, and one from 
oxygen. They are combined to produce three z molecular orbitals 
presented in order ofincreasing energy. To emphasize the symme- 
try of these z molecular orbitals, they have been drawn as if there 
were no coulomb effect. In the combination of lowest energy, 
all three of the constituent p orbitals overlap in phase. In linear 
x molecular orbital systems with an odd number, n, of atoms, the 
number of nodes in the central nonbonding molecular orbital is 
equal to 42(n - 1). To achieve a symmetric distribution of nodes 
in the central nonbonding molecular orbital, there are lobes on 
the two end atoms and nodes on every other atom in between, 
alternating lobe, node, lobe, node, and so forth. In the idealized 
three-atom system displayed here, there is a lobe at nitrogen, a 
node at carbon, and a lobe at oxygen in the central nonbonding 
z molecular orbital. 


in the lowest bonding level would have half of their den- 
sity distributed over carbon, one-quarter over oxygen, 
and one-quarter over nitrogen. The two electrons in the 
middle nonbonding level would have half of their density 
distributed over nitrogen and half over oxygen. 

If the four z electrons were removed from the three 
atoms of the amide, the carbon and the oxygen would 
each have formal charges of +1 but the nitrogen would 
have a formal charge of +2, making it electron-deficient 
relative to the other two. If two pairs of ~m electrons 
occupy the two undistorted zmolecular orbitals of 
lowest energy, oxygen would end up with a formal charge 
of -%; carbon, 0; and nitrogen, +. This is the distribu- 
tion of charge designated by the two resonance struc- 
tures. Usually the resonance structures provide 
information about the distribution of electrons in the 
highest occupied molecular orbital or the distribution 
of electron deficiency in the lowest unoccupied molecu- 
lar orbital. In the case of the amide, the resonance struc- 
tures indicate that the pair of electrons in the highest 
occupied molecular orbital can occupy locations only 
over the nitrogen and the oxygen. This example illus- 
trates the fact that resonance structures and molecular 
orbitals should agree in their assessment of electron dis- 
tribution. 

In Figure 2-3 and the description just presented, it 
was assumed that there was no coulomb effect; in other 
words, that all three atoms in the ostructure had the 
same electronegativity and formal charge. In a real 
amide, oxygen is the most electronegative atom and 
nitrogen is electron-deficient. The resulting coulomb 
effects cause the bonding molecular orbital of lowest 
energy to be skewed so that oxygen ends up with more 
electron density than carbon, rather than less, and the 
nonbonding molecular orbital of intermediate energy to 
be skewed so that nitrogen ends up with more electron 
density than oxygen. The node at carbon in the ideal 
nonbonding z molecular orbital shifts toward oxygen or 
toward nitrogen depending on whether or not the 
oxygen is hydrogen-bonded.” 

Resonance theory has always incorporated the fact 
that the several structures drawn do not have independ- 
ent existence, but occasionally, by mistake, it is implied 
that they do.’ In the extreme, the double-headed arrow 
of resonance becomes replaced with the two arrows of a 
chemical equilibrium, a mistake that engenders serious 
confusion.’ To avoid such confusion, a double-headed 
arrow should be used only to indicate resonance, never 
to indicate an equilibrium, and the two arrows of a chem- 
ical equilibrium should never be used to indicate reso- 
nance. That only one, undivided system of z molecular 
orbitals represents the resonance hybrid is a reaffirma- 
tion of the absence of independent existence. 
Unfortunately, while z molecular orbitals present a more 
accurate picture of the molecular structure and avoid the 
confusion with equilibrium, they do not have the 
accounting capability of formal resonance structures, 


and each view, whether molecular orbitals or resonance 
structures, has its appropriate use. 

The first decision that must be made about the elec- 
tronic structure of any molecule is the location of all sys- 
tems of zmolecular orbitals. Any carbon, nitrogen, or 
oxygen that has contributed a 2p atomic orbital to a 
system of zmolecular orbitals has only two other 
2p atomic orbitals remaining to hybridize with its lone 
2s atomic orbital, but any carbon, nitrogen, or oxygen 
that is not involved in a system of z molecular orbitals has 
three 2p atomic orbitals to hybridize with its 2s atomic 
orbital. It is these hybrids between s atomic orbitals and 
p atomic orbitals that overlap to form obonds. These 
o bonds lie along the line of centers between the two 
respective atoms that they connect, and they are local- 
ized. Because they are localized, they are usually stronger 
covalent bonds than z bonds, and as a result every pair of 
atoms joined by one or more than one covalent bond 
must be joined by one o bond. These o bonds form the 
molecular skeleton defining the structure of the mole- 
cule, in particular its bond angles. This skeleton is the 
o structure of the molecule. Each o bond is also an occu- 
pied molecular orbital, but this realization is not inform- 
ative in issues of molecular structure. In the particular 
instance of molecules in biological situations, when an 
atom has contributed one p orbital to a system of z mol- 
ecular orbitals, it will almost always be hybridized [p, sp’, 
sp’, ap, At that atom in the ø structure, the molecule will 
be planar, and the o covalent bonds and o lone pairs will 
radiate within that plane in three directions from the 
atom at approximately 120° angles. When an atom has 
not contributed a p orbital to a system of m molecular 
orbitals, it will almost always be hybridized [sp°, ep"), sp’, 
sp’). At that atom in the o structure, o covalent bonds or 
o lone pairs will radiate in four directions tetrahedrally, 
at angles of approximately 109.5°. 

Because the o structure incorporates these bond 
orders and bond angles, it dictates the details of molecu- 
lar structure. These details cannot be appreciated until 
decisions on hybridization can be made correctly. To 
pursue an earlier example, the oxygen, carbon, and 
nitrogen of an amide are each contributing a p orbital 
to the system of amolecular orbitals, and each is 
hybridized [p, sp’, sp’, sp]. In the ø structure each of 
these three atoms and all of the o bonds and o lone pairs 
of electrons radiating from them are in a plane, and each 
bond angle is approximately 120° (Figure 2-3). 

Lone pairs of electrons are identified by writing a 
Lewis structure of the molecule. Thereafter, it is conven- 
tional to ignore them, on the assumption that everyone 
knows that they are there. This assumption is somewhat 
vain; it seems to say that if you do not always realize that 
they are there, you are not someone. Because lone pairs 
of electrons are of paramount importance in biochem- 
istry and because an understanding of a biologically 
important molecule is incomplete if ever they are forgot- 
ten, it is safer to include them explicitly in any structure 
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drawn, especially if they contribute to what is being dis- 
cussed. 

Because of electron repulsion, a lone pair of elec- 
trons on any oxygen or nitrogen unconjugated to a 
system of zmolecular orbitals will occupy one of the 
sp’ orbitals of that atom. This clone pair of electrons 
resides at one of the tetrahedral vertices of the atom. A 
o lone pair of electrons is a lone pair confined to a single 
atom because it resides within a hybridized or unhy- 
bridized atomic orbital that does not overlap with any 
other atomic orbital from another atom. A ø lone pair of 
electrons is designated in a o-r stereochemical repre- 
sentation by enclosing it within a circle (Figures 2-1B and 
2-3) to symbolize its confinement. A o-z stereochemical 
representation (Figure 2-1B) is a drawing of the mole- 
cule that indicates the bond angles and angles of o lone 
pairs and distinguishes olone pairs of electrons from 
m lone pairs of electrons. 

If an oxygen or nitrogen containing a lone pair of 
electrons is sterically able to rotate until that lone pair is 
parallel to an immediately adjacent system of z molecu- 
lar orbitals and sterically able to rehybridize to sp? at its 
three remaining bonded positions, the lone pair of elec- 
trons is capable of entering the system of m molecular 
orbitals. For the lone pair of electrons to do this, the atom 
carrying it must rehybridize. This rehybridization 
requires sufficient energy to overcome the electron 
repulsion that originally placed the lone pair in an 
sp’ orbital. The favorable energy resulting from the 
delocalization of the lone pair into the system of m mole- 
cular orbitals must exceed this deficit. If it does, the lone 
pair of electrons becomes a delocalized zlone pair of 
electrons, occupying a z molecular orbital spread over 
two or more atoms. It is so designated in a drawing by not 
enclosing it within a circle (Figures 2-1B and 2-3) to indi- 
cate its unconfinement. 

When either oxygen or nitrogen has contributed 
only one of its p orbitals to a system of amolecular 
orbitals and is left with three valence orbitals, one 
2s orbital and two 2p orbitals, it is usually assumed that 
they mix to form three sp’ orbitals that lie together within 
a plane normal to the system of z molecular orbitals and 
are arrayed at 120° angles. If there are two or three cova- 
lent o bonds to the heteroatom, the hybridization is usu- 
ally [p, sp’, ap, sp’] because sp’ orbitals provide 
maximum overlap in a obond. Thus a single lone pair 
left on a nitrogen that has contributed only one p orbital 
and one of its valence electrons to a system of z molecu- 
lar orbitals and also participates in two o bonds is always 
ao lone pair in an sp’ orbital, and it is designated as such 
by surrounding it with a circle. An example of such a lone 
pair is the lone pair on a nitrogen in an imine: 
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The situation becomes ambiguous, however, in the 
case of an oxygen that has contributed a p orbital and 
one valence electron to a system of z molecular orbitals, 
participates in one o bond, and remains with two lone 
pairs of electrons. An example of such an oxygen would 
be an acyl oxygen or the oxygen of a carbonyl (Figure 
2-4). The possibility arises that such an oxygen is 
hybridized [p, p, sp, spl. In this case, one lone pair would 
occupy an sp orbital in line but opposite to the o bond 
between the carbon and the oxygen, and the other lone 
pair would occupy a p orbital normal to both the z bond 
and the axis of the two sp orbitals (Figure 2-4A). Indeed, 
there is evidence from ultraviolet spectra and mass spec- 
tra of isolated carbonyl compounds that this occurs. The 
alternative possibility is that oxygen is hybridized [p, sp’, 
sp’, sp”] and that both lone pairs are in sp? orbitals 
(Figure 2-4B). The decision between these two alterna- 
tives is not an insignificant one, for oxygens that have 
contributed one p orbital and one valence electron to a 
system of zmolecular orbitals and participate in only 
one o bond are by far the majority of the oxygen atoms in 
a molecule of protein. In a hydrogen-bonding environ- 
ment, such as the water in which all biochemistry occurs, 
it appears that these oxygens place their two lone pairs in 
two sp’ orbitals. This follows from the fact that, in crys- 
tallographic molecular models of small molecules in 
which an N-H forms a hydrogen bond with such a car- 
bonyl or acyl oxygen, the nitrogen-hydrogen o bond of 
the N-H usually points to the location where an sp” lone 
pair of electrons would be located.’ On the basis of this 
observation, it will be assumed that acyl oxygens are 
hybridized [p, sp?, sp’, sp?], and their two lone pairs will 
both be designated as sp? by enclosing them in circles at 
120° angles to the carbon-oxygen bond (Figure 2-1B).° 
These are o lone pairs of electrons, they lie within a plane 
shared with the carbon-oxygen o bond and normal to 
the plane of the carbon-oxygen x bond (Figure 2-4B). 

The o structure of a molecule is the basic skeleton 
producing the o bonds, the bond angles, and the fixed 
positions of the localized o lone pairs. The z electrons 
are spread over this skeleton above and below the atoms 
contributing the p orbitals. Therefore, neither the bond 
angles of the molecule, which are defined by hybridiza- 


Figure 2-4: Two alternative hybridizations for an oxygen in a car- 
bonyl. (A) One lone pair is in an sp orbital collinear with the 
carbon-oxygen bond, and the other is in the p orbital orthogonal to 
the double bond. (B) Both lone pairs are in sp’ orbitals in the 
o plane. 


tion, nor the positions of ø lone pairs, which are local- 
ized, can be affected by resonance. It necessarily follows 
that when one draws two or more resonance structures, 
one must make certain that the same o structure is pres- 
ent in each resonance structure and that only the dispo- 
sition of z electrons differs among them. The best way to 
ensure this is to draw a o structure for the second reso- 
nance structure identical to the o structure of the first 
resonance structure before putting in the zelectrons, 
and to draw an identical o structure for each of the suc- 
cessive resonance structures before putting in the z elec- 
trons. Always include all clone pairs oriented as the 
hybridization of each atom requires. After the set of valid 
resonance structures has been exhausted, look closely at 
any lone pair that did not participate and decide if it 
might not be a o lone pair. If it is not completing an aro- 
matic complement or being withdrawn by an adjacent 
x bond, it is probably a o lone pair in a o orbital confined 
to only the one atom. 

When the atoms contributing the p orbitals to a 
system of zmolecular orbitals form an unbroken ring 
rather than being branched or linearly arrayed, the pos- 
sibility of aromaticity arises. In a continuous ring of 
p orbitals of any size, the energy levels of the individual 
m molecular orbitals are arrayed in a peculiar pattern. 
The z molecular orbital with the lowest energy is always 
the completely overlapping ring of p atomic orbitals in 
phase with no nodes other than the one at the nuclear 
plane. This z molecular orbital is occupied by two elec- 
trons. If coulomb effects were disregarded, the other 
bonding z molecular orbitals in the ring would always 
come in pairs that have identical energies. Because of 
Hund’s rule, no such pair of orbitals can be filled with 
electrons to form a stable closed shell until four electrons 
have been provided simultaneously. These two features, 
the one continuous ring occupied by a pair of z electrons 
and the pairs of orbitals of higher energy occupied by 
quartets of zelectrons, define an aromatic system of 
m molecular orbitals. An aromatic zmolecular orbital 
system is an unbroken ring of parallel p orbitals occu- 
pied by 2, 6, 10, 14, or 18 x electrons. 

From these rules it is clear that a phenyl ring is aro- 
matic, but it is the aromatic nitrogen heterocycles such 
as pyridine and pyrrole 
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that are more interesting examples. Pyridine is a neutral, 
six-membered ring with one nitrogen. Each carbon con- 
tributes one valence electron to the z system, so nitrogen 
can contribute only one to complete the sextet of the 


aromatic system. This leaves a neutral nitrogen with two 
remaining valence electrons that end up as a lone pair in 
the ostructure confined to an sp’ orbital. Pyrrole, how- 
ever, is a five-membered ring. Each carbon again 
contributes one valence electron to the system of z mol- 
ecular orbitals, and nitrogen provides the two required to 
complete the sextet required for the aromatic system. 
Nitrogen is left with one valence electron and forms a 
covalent N-H bond to finish the neutral molecule. 
Pyridinyl and pyrrolyl nitrogens appear throughout 
aromatic heterocycles. A nitrogen can be identified as 
one or the other by whether one or two of its valence 
electrons are used to complete the aromatic system of 6, 
10, 14, or 18 zelectrons. 

An interesting heterocycle that serves as an exam- 
ple of the application of these considerations is 
porphine: 


This is the simplest porphyrin; more elaborate por- 
phyrins are components of the heme-containing coen- 
zymes. Hidden within this molecule is an unbroken ring 
of 16 carbon atoms and two nitrogen atoms, each con- 
tributing a p orbital and creating a system of z molecular 
orbitals containing 18 z electrons. Therefore, porphine is 
aromatic. For this to be possible, two of the nitrogens 
must be pyridine nitrogens and each of them contributes 
a p orbital and one zelectron. The other two nitrogens 
do not participate in the aromatic ring but nevertheless 
reside immediately adjacent to it and inside of it. They 
each retain a lone pair of electrons. Each of those lone 
pairs resembles the z lone pair on the nitrogen of aniline, 
but each is located endocyclically rather than exocycli- 
cally. This requires that each of them, as with the nitro- 
gen in aniline, have three covalent o bonds, hence the 
two central hydrogens. 

Behind the distinctions among systems of z molec- 
ular orbitals, o bonds, and o lone pairs is the concept of 
orthogonality. Each system of z molecular orbitals, each 
o bond, and each o lone pair of electrons is orthogonal to 
every other system of z molecular orbitals, obond, and 
olone pair in the molecule. As such, to a first approxi- 
mation, each is an independent moiety that does not 
share electrons with the others. It is this compartmental 
quality of bonding that permits each of these positions to 
be chemically distinct and have its own properties. It is 
this fact, rather than a desire to categorize, that renders 
these distinctions important. They must be clearly made 
in any drawing of the molecule. 
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The chemical properties of o and zlone pairs of 
electrons are remarkably different. This difference is most 
clearly expressed in their behavior as bases, and it is the 
basicity of a lone pair of electrons that, in questionable 
cases, indicates whether it is a o or zlone pair of elec- 
trons. When the basicity of the lone pair is relied upon as 
a criterion, a proton is being used to probe its availabil- 
ity. Lone pairs of electrons in x systems are far less basic 
than those in o orbitals because o lone pairs of electrons 
are localized and directionally oriented by the atomic 
orbital in which they are confined, whereas z lone pairs 
of electrons are delocalized and immersed within the 
system of z molecular orbitals. 


Problem 2-1: Draw o-z stereochemical structures as in 
Figure 2-1B for the N-acetyl a-amides of aspartate, 
asparagine, glutamine, proline, methionine, tyrosine, 
tryptophan, phenylalanine, histidine, and arginine. 


Problem 2-2: The following skeleton structures are var- 
ious heterocycles. None is intended to be a radical, all 
have pairwise filled molecular and atomic orbitals, and 
none has a total of more than one positive or negative 
elementary charge. No z electrons or lone pairs of elec- 
trons are shown, and all atoms are shown. All of the com- 
pounds are aromatic. 
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(A) Decide how many z electrons each heterocycle 
contains to make it aromatic. 

(B) Draw resonance forms that indicate the 
distribution of these z electrons. There may be only 
one. 


(C) Complete the octet for every nitrogen and oxygen 
by adding lone pairs. 
(D) Assign formal charges in each resonance form. 


(E) Draw the ostructure of each heterocycle, including 
all lone pairs, and assign hybridization to each 
atom. 
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The quantitative measure of the basicity of a lone pair of 
electrons is the microscopic acid dissociation constant of 
its conjugate acid. In this way all lone pairs of electrons 
are related to the lone pair on a molecule of water. The 
reaction that defines a microscopic acid dissociation for 
a particular proton in a molecule is 


exe) HEO 

X-H + AS == XO+ H en 
S d 
H H H H 


The central atom in a microscopic acid dissociation is 
the atom directly bonded to the proton that dissociates.* 
The lone pair on the resulting conjugate base is usually 
localized on the central atom (as represented in 
Reaction 2-1) when it is oxygen, nitrogen, or sulfur but 
usually delocalized when it is carbon. In a microscopic 
acid dissociation, the acid is a position within the mole- 
cule from which a proton can dissociate to produce a 
lone pair of electrons, and the base is a lone pair of elec- 
trons with which a proton can associate. Because the 
reaction occurs in aqueous solution, a bare proton is 
transferred between the lone pair of the base and a lone 
pair on a molecule of water and back again. Every acid is 
always present in solution with a finite concentration of 
its conjugate base, and every base is always present in 
solution with a finite concentration of its conjugate acid. 
The equilibrium constant for Reaction 2-1 is 


_ [H,0*] [0x] 


ea = THO] [AX] GN 


Because [H,O] = 55 M at all times, this term is passed to 
the left, and for convenience [H,0*] is written as [H*].** 
These substitutions produce the definition of the micro- 
scopic acid dissociation constant: 


[H*][©x] 
[HX] 


(2-3) 


a 


A microscopic acid dissociation constant is the acid dis- 
sociation constant of a particular proton in a polyprotic 
acid. An acid dissociation constant is usually presented 
as a pK,, where pK, = -log K,, solely for convenience. A 
theoretical justification of this practice is that the pK, is 
directly proportional to the change in free energy for 
Reaction 2-1. The larger the pK, the less likely is 


* The central atom is not in the center of the molecule; it is just the 
atom that is central to the acid dissociation. 

** In fact, the designation HOT is as misleading as H* because the 
dissociated proton in water is shared by four molecules of water"? 
as the cation Ha to 21 molecules of water?! as the cation 
Hat, 


Reaction 2-1 in the direction written and the more likely 
is it in the opposite direction. Because water is the same 
in all acid dissociations, the difference in pK, between 
two acids is proportional to the free energy for transfer- 
ring a proton from the one acid to the conjugate base of 
the other. The smaller the pK,, the more acidic is the acid 
and the less basic, or less available, is the lone pair of 
electrons on its conjugate base, and vice versa. There are 
several properties of the position from which the proton 
dissociates that affect the value of its microscopic pK}. 

The atomic number of the central atom from which 
the proton dissociates and on which the lone pair 
remains has a profound effect (Table 2-1). Within the 
same period of the periodic table, as electronegativity 
increases to the right, for example, carbon, nitrogen, 
oxygen, the atom is more capable of supporting the lone 
pair, and the acidity increases. Atoms in lower periods 
hold a lone pair of electrons in a larger atomic orbital, 
making it easier to support. For example, a proton on 
sulfur is more acidic than one on oxygen. Because a 
localized clone pair of electrons on carbon is such a 
strong base, the only time that there is a lone pair associ- 
ated with carbon in biochemical situations is when it is a 
delocalized zlone pair of electrons. Because nitrogen 
and oxygen are more electronegative elements than 
carbon, delocalized z lone pairs of electrons associated 
with these elements are rarely bases in biochemical situ- 
ations, and bases on these atoms are almost always local- 
ized olone pairs of electrons. 

The successive creation of negative elementary 
charge on the same polyprotic acid causes each dissoci- 


Table 2-1: Electronic Properties Affecting Values of the 
Acid Dissociation Constant 


effect of identity of central atom on acidity"! 
CH, < NH; < OH, < SH, 
pk,=48 pk,=38 pK,=15.7 pkK,=7.0 


effect of creation of charge on acidity” 
PO,H; > POH: > POH> 
pk, =2.1 pk, =7.2 pk, = 12.7 


*H;NCH,CH,NH, < *H,NCH,CH;NH;* 
pK, = 9.98 pK, = 7.52 


effect of hybridization of the central atom on acidity!!! 


HC=CH > H,C=CH, > H3CCH; 
pk, =25 pK, = 44 pk, =50 


HC=NH* > pyridine > HACNHA 
pk, =-10 pK, = 5.2 pK, = 10.6 
CH;HC=OH* > CH,CH,OH,* 
pk, =-6 pk, =-2 


effect of induction on acidity” 
CF;CH,OH > CHF,CH,OH > CH,FCH,OH > CH,;CH,OH 


pK, = 12.4 pk, =13.1 pK, = 14.2 pK, = 16.0 
H;C,OOCCH,NH;° > HsC,0OCC;H,NH;* > HsC;00CC3H,NH;° 
pk, =7.7 pk, =9.1 pK, =9.7 


ation of a proton to be more difficult than the previous 
one, and the successive creation of positive charge on the 
same polybasic molecule causes each association of a 
proton to be more difficult than the previous one (Table 
2-1). The farther apart the charges that are created, how- 
ever, the narrower are the increments in pK,. For exam- 
ple, the difference in the two values of pK, for 
1,3-diaminopropane (1.98) is smaller than the difference 
in the two values of pK, for 1,2-diaminopropane (2.87). 

The hybridization of the central atom affects its pK, 
considerably (Table 2-1). All localized clone pairs of 
electrons are in hybrid orbitals formed by mixing one 
s orbital with one, two, or three p orbitals, as indicated by 
the designations sp, sp’, and sp’, respectively. The fewer 
the number of p orbitals in the mixture, the greater the 
fraction of the s orbital distributed into each hybrid and 
the more s character the hybrid will have. The more 
s character there is to the orbital, the closer the lone pair 
of electrons is held next to the nucleus, the less extension 
of electron density along any particular axis will be dis- 
played, and the less basic will be the orbital. 

Electronegative or electropositive atoms adjacent 
to the central atom also have a significant but less 
remarkable effect on acidity (Table 2-1). These withdraw 
or donate electrons by induction through o bonds and 
decrease or increase the basicity of the lone pair of elec- 
trons accordingly. 

Because the conjugate acid has a single, ø covalent 
bond between the central atom and the hydrogen, the 
pair of electrons that has been protonated in its creation 
must already be or must become o electrons. If they were 
a olone pair of electrons before the proton was added, 
rehybridization of the central atom is not usually 
involved in the protonation. If, however, they were a 
mlone pair of electrons prior to the protonation, the 
favorable delocalization energy that they gained within 
the z system is eliminated as the o bond is formed. The 
greater this delocalization energy, the more free energy 
will be required to protonate the lone pair of electrons 
and, because this free energy is supplied by the con- 
centration of protons, the smaller will be the pK, of 
the conjugate acid. It is this fact that causes the pK, of the 
conjugate acid to register the degree of delocalization of 
a pair of z electrons or in other words the strength of their 
x character. This ability is most clearly exemplified in the 
protonation of aromatic compounds where the lone pair 
of electrons is delocalized over the aromatic ring. For 
example, cyclopentadiene is a strong carbon acid" 


H pK, = 16 = 
SE + H20 Ss; + H30* (2-4) 


compared to propane (pK, =50)"' because protonation of 
the lone pair of electrons on the conjugate base of the 
former destroys the aromaticity of the anion. Likewise, 
N-methylpyrrole is a weak base”? 
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pKa= -2.9 = 
Haen; J + Ha" 
a 


(2-5) 
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OF H 
H20 + HCN 
ZA 
compared to N-methylazacyclopentane'? 


N \ pKa=10.5 / \ 
+ HO = N + HO" (2-6) 
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because protonation of the former destroys its aromatic- 
ity. In this case, the comparison is a minimum estimate 
of the difference in pK, resulting from aromatic delocal- 
ization because pyrrole protonates on carbon rather 
than on nitrogen. The pK, for the conjugate acid proto- 
nated on nitrogen must be much lower than -2.9. 

There are lone pairs of electrons in nonaromatic 
configurations, the acid-base behavior of which reveals 
the degree of their m character. The pK, associated with 
the lone pair of electrons on aniline (pK, = 4.6)! can be 
compared with the pK, associated with the lone pair of 
electrons on cyclohexylamine (pK, = 10.6).'* The signifi- 
cant difference in basicity demonstrates that the lone 
pair in aniline is a z lone pair of electrons conjugated to 
the neighboring m system of the phenyl ring. The pKa 
(pk, = -6)!* associated with the lone pair of electrons on 
the nitrogen in an amide (Figure 2-3) is even lower 
than that of the lone pair on aniline and indicates that 
it is even more delocalized. This is not surprising 
since the phenyl ring of aniline is otherwise involved in 
its own aromaticity and the oxygen of the amide has a 
strong coulomb effect in the lowest occupied molecular 
orbital.” 

Other examples of the use of a proton to evaluate 
the z character of a lone pair of electrons occur in carbon 
acids. The pK, of a methyl group in propene (pK, = 43)" 
is much lower than that in propane (pK, = 50)'! because 
the lone pair produced upon the dissociation of a proton 
from propene conjugates with the neighboring x system 
of the alkene. The analogous lone pair on carbon in the 
conjugate base of acetaldehyde (pK, = 17.6)” is even less 
basic because, as occurs with an amide, an oxygen is 
located two atoms away and exerts a strong coulomb 
effect. When the system of the conjugate base is 
extended from three to five atoms in length, as in the 
conjugate base of 2,4-dioxopentane (pK, = 9), the lone 
pair of electrons in the conjugate base becomes even 
more delocalized and less accessible to protonation. 

In making such comparisons, care must be taken to 
avoid confounding the reasons for the changes in ok, A 
common confusion is that between the effects of 
hybridization and conjugation. One reason that the 
acetate anion, the conjugate base of acetic acid 
(pK, = 4.75),'” is a weak base compared to the ethoxide 
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anion, the conjugate base of ethanol (pK, = 16),’* is that 
the basic lone pairs of electrons in the acetate anion are 
hybridized sp? (Figure 2-5) rather than sp’. The system of 
m molecular orbitals of the acetate anion, composed of 
four zelectrons in a three-atom system (Figure 2-3), 
does not provide a pair of electrons to be protonated, 
notwithstanding any drawing suggesting this to be the 
case. It is a olone pair of electrons orthogonal to the 
system of zmolecular orbitals that is protonated, and 
acetate anion cannot be used as an example of the 
decrease in basicity that results when the lone pair of 
electrons created upon the departure of the proton is 
conjugated to a m system. 

There is an indirect effect of conjugation on the 
acidity of a carboxylic acid such as acetic acid. When one 
of the clone pairs on the acetate anion is protonated or 
alkylated, the functional group is no longer symmetric, 
and the oxygen that has been so modified becomes more 
electronegative. This change withdraws more electron 
density onto the protonated or alkylated oxygen, as indi- 
cated by the resonance structures: 
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Figure 2-5: Resonance structures (top) and atomic orbital overlap 
(bottom) in a carboxylate anion. The resonance structures indicate 
that one pair of electrons from each oxygen is delocalized and that 
the zmolecular orbital system results from the overlap of three 
p atomic orbitals, one from the central carbon and one from each 
of the two oxygens. Each of these three atoms is hybridized [p, sp’, 
sp”, sp?]. As indicated schematically, the two ø lone pairs of elec- 
trons in the molecular plane are the bases that can associate with a 
proton, not the delocalized lone pairs of the z molecular orbital 
system. The syn and anti lone pairs are labeled. 


The separation of charge in the structure on the right is 
the reason that the lone pair of electrons on the alkylated 
oxygen or protonated oxygen is less delocalized than the 
m lone pair of electrons on an unalkylated or unproto- 
nated oxygen in the carboxylate anion. Nevertheless, the 
bond between the protonated or alkylated oxygen and 
the acyl carbon retains some of the double-bond charac- 
ter indicated by the less advantageous form on the right. 
This is manifested in the almost 120° angle (116.5°) 
between an alkyl carbon and an acyl carbon at the 
oxygen of an ester and a shortening of the bond between 
the oxygen of an ester and the acyl carbon by 0.09 nm rel- 
ative to carbon-oxygen bonds between sp” carbons and 
oxygens in aryl and vinyl compounds." Therefore, an 
ester or the conjugate acid of a carboxylic acid retains the 
overlap of the system of zmolecular orbitals, but the 
overlap is considerably weakened relative to the unalky- 
lated or unprotonated anion. During protonation of a 
carboxylate anion, the delocalization in the orthogonal 
msystem is considerably diminished, and this effect 
destabilizes the conjugate acid and lowers the pK}. A sim- 
ilar but less pronounced effect of a decrease in delocal- 
ization upon protonation occurs with phenol. In the case 
of phenol, the conjugation in the anionic conjugate base 
is weaker than that in the anionic conjugate base of a car- 
boxylic acid because the elementary negative charge is 
distributed over the oxygen and three carbons. 
Consequently, the effect of diminishing this conjugation 
upon protonation is less, and phenol is a weaker acid 
than acetic acid. 

The acetate anion illustrates another property of a 
system of z molecular orbitals—its ability to redistribute 
charge. The elementary negative charge in the acetate 
anion is shared between the two oxygens because the 
system of zmolecular orbitals is spread over all three 
atoms. The two electrons in the highest occupied molec- 
ular orbital, which account for the negative charge of the 
functional group, can reside only over the two oxygens, 
as there is a node over carbon (Figure 2-3). When one of 
the oxygens becomes protonated, the z system redistrib- 
utes and more zelectron density is shifted over the 
oxygen that has become protonated because its coulomb 
effect has increased. This shift in distribution of charge is 
reflected in the resonance structure chosen for portray- 
ing the conjugate acid: 


A similar ability of a x system to redistribute charge 
is reflected in the lower acidity of p-methoxypyridinium 
(pK, = 6.67) compared to pyridinium (pK, = 5.17). This 
difference results from the ability of the electron density 


of the methoxy substituent to push into the x system 
through the conjugation represented by the resonance 
structure 
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so that the elementary positive charge on the nitrogen is 
delocalized. In the opposite sense, an example of a shift 
of electron density away from the central atom occurs in 
the p-nitrophenolate anion, whose associated pK, is 7.2, 
compared to the phenolate anion, whose associated pK, 
is 10.0. This can be explained by the resonance structure 
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In each of these examples, the redistribution of charge 
among electronegative atoms is accomplished by the 
highest occupied molecular orbital of the m system, 
which is spread over the whole molecule. 

The microscopic pK, of an acid-base is determined 
by a combination of all of these properties: the elec- 
tronegativity and hybridization of the central atom, any 
creation of charge, the inductive effect, any delocaliza- 
tion of the lone pair of electrons, and any redistribution 
of charge. 

The biological molecules that illustrate most exten- 
sively the various aspects of bonding and acid-bases dis- 
cussed so far are the bases of the nucleosides. 


Pyrimidines 
Oo HH 
dap X N: 
2 2 
Oi : OA i 
O , 5 © 
"TO on "on 
HO”? nx HO 
OH OH 
2-9 2-10 
Uridine Cytidine 
U C 


Acids and Bases 65 


Purines 


OH OH 
2-11 2-12 
Guanosine Adenosine 
G A 


Each of these nucleosides, uridine, cytidine, guanosine, 
and adenosine, is composed of the base itself, uracil, 
cytosine, guanine, and adenine, respectively, and a ribo- 
syl group attached to N1 or N9 of that base. The nucleo- 
side bases are hybrid structures of aromatic heterocycles 
and amides. The most aromatic base is adenine. It is sus- 
ceptible to electrophilic aromatic substitution at carbons 
2 and 8 but is also susceptible to nucleophilic substitu- 
tion at carbon 6 in reactions that resemble acyl exchange. 
The most amidic base is uracil. It is unambiguously an 
N-acyl-N’-alkenyl-N’-ribosylurea. The carbon-carbon 
double bond in uracil has almost olefinic character. It is 
susceptible to addition reactions, unlike the system of 
m molecular orbitals in an aromatic compound, which 
would be susceptible only to substitution. 

The nucleoside bases in adenosine, guanosine, and 
cytidine have exocyclic nitrogens resembling the nitro- 
gen in aniline. The lone pairs of electrons on these nitro- 
gens are even more delocalized than the one on the 
nitrogen in aniline (pK, = 4.6) because the pK, for the 
conjugate acid of each of these nitrogens in the nucleo- 
side bases” is less than or equal to -2, similar to that for 
N-protonated urea (pX,<-A4). Therefore, each of these 
exocyclic nitrogens is planar and trigonal, as is always 
depicted in drawings of the base pairs. 

The nucleoside bases in uridine, cytidine, and 
guanosine have exocyclic oxygens resembling the oxygen 
in an amide. The values of pK, for the conjugate acids of 
these exocyclic oxygens are 0.5, <4.2, and <1.6, the upper 
limits being the values of pK, for the N-protonated tau- 
tomers. These values can be compared to -0.7, the pK, 
for the oxygen of acetamide.'® The values of pK, for these 
oxygens in the iminolic tautomers of these three bases, 
estimated from the measured”? or theoretical?”?' values 
for the equilibrium constants between the amidic and 
iminolic tautomers, are 5, 4, and -3 for uridine, cytidine, 
and guanosine, well below the value of 10 for the pK, of 
phenol. These values of pK, as well as the fact that the 
iminol tautomers are far less stable than the amidic tau- 
tomers are the justification for depicting these oxygens 
as acyl oxygens. 

There are two types of calculations performed with 
acid-bases. The pH ofa solution to which a weak acid or 
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weak base has been added can in theory be calculated, 
and the ratio of the molar concentrations of conjugate 
acid and conjugate base in a solution of a given pH can 
in practice be calculated. 

The calculation of the pH of a solution upon the 
addition of an acid-base is an exercise in simultaneous 
equations. The problem takes the form “Calculate the pH 
of a solution to which 0.1 mol of sodium acetate has been 
added for every liter.” The equations always used are the 
conservation of mass 


[HOAc] + [OAc] = 0.1 M (2-7) 


where HOAc is acetic acid and OAc is acetate anion; the 
conservation of charge 


[OAc] + [0H] = [Nat] + [H+] (2-8) 


where [Nat] = 0.1 M; the acid dissociation constant or 
constants 


["OAc] [H*] 
Fleur TESTEN ge Tu (2-9) 
[HOAc] 
where pK, = 4.75 and K, = 1.78 x 10° M; and the water 
constant 
[H+] OH] = 10714 M? (2-10) 
These comprise four—or if necessary more, depending 
on the number of dissociation constants the acid has— 
independent simultaneous equations with four, or if nec- 
essary more, unknowns. In the case of acetate, the four 
unknowns are [H], [OH], [OAc], and [HOAc]. These 
four equations with four unknowns can be readily solved 
for [H*] (1.33 x 10° M) if the assumption is made that [H*] 
in Equation 2-10 is negligible relative to the other terms. 
The value of this exercise is that the creation of the 
simultaneous equations and the cancellation of certain 
terms to avoid a cubic or quadratic equation requires an 
understanding of the acid-base chemistry that is occur- 
ring in the solution. For example, one is required to know 
that the only ions that can be present are H*, Na, OAc, 
and OH and that sodium acetate is a base so the con- 
centration of protons in the final solution will be small. 
The calculation of the concentrations of a conjugate 
acid and its conjugate base at a given pH fulfills one of 
two purposes. First, if the solution contains an acid-base 
of experimental interest, such as an acid-base in a mole- 
cule of protein, this calculation will provide the molar con- 
centrations of the conjugate acid and the conjugate base 
of that acid-base. Second, if a particular buffer is used to 
stabilize the pH at a particular value, this calculation can 
be used to determine the concentrations of conjugate acid 


and conjugate base required for the buffer. A buffer is a 
solution of an acid and its conjugate base, both present 
at high enough concentrations so that the acid can 
neutralize bases added to the solution and the base 
can neutralize acids added to the solutions, and between 
them they can keep the pH of the solution constant. 

A problem concerning a buffer can be stated 
“Calculate the number of moles of acetic acid and 
sodium acetate that are present in 2.00 L of an 0.1 M 
solution of acetate plus acetic acid at pH 5.5.” This prob- 
lem requires only two simultaneous equations, Equa- 
tions 2-7 and 2-9. Because [H*] is given as 3.16 x 10° M, 
there are only two unknowns, [HOAc] and [OAc], which 
are 0.015 and 0.085 M, respectively. The answer is 
0.03 mol of acetic acid and 0.17 mol of sodium acetate. 

The quantitative behavior of the concentrations of 
the conjugate acid and conjugate base of each acid-base 
is described by a titration curve (Figure 2-6) that relates 
the fraction of the acid-base in the form of the conjugate 
acid or in the form of the conjugate base to the pH of the 
solution. This can be presented as the fraction itself 
(Figure 2-6A), as is usually done, but this presentation 
leaves the erroneous impression that the fraction of acid 
goes to zero about 2 pH units above the pK, and the frac- 
tion of base goes to zero about 2 pH units below the pKa. 
This misimpression is corrected by examining the loga- 
rithms of the fractions as a function of pH (Figure 2-6B). 
It can be seen that finite fractions of both acid and base 
are still present at high and low pH, respectively. At a dis- 
tance of 2 pH units above the pKa» 1% of the acid-base is 
in the form of the acid, and this percentage drops off by 
a factor of 10 for every rise of unit of pH but never reaches 
zero. The importance of this point is that often only one 
species of the acid-base participates in a chemical reac- 
tion, yet the reaction will occur quite well at a pH where 
the reactive species is present at only 1% or 0.1% or 0.01% 
or less of the total acid-base. Protonation and deproto- 
nation are extremely rapid, and as the minor but reactive 
species is consumed in the reaction, it is continuously 
replaced. 


Problem 2-3: Complete the following acid-base equi- 
libria. Draw the structures of the conjugate base and the 
acid in cœ- stereochemical representation (Figure 2-1B). 
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Figure 2-6: Titration curves for acetic acid. From the acid dissoci- 
ation constant for acetic acid (pK, = 4.75), the fraction of the 
acid-base present in solution as the conjugate acid (fcn cool Or as 
the conjugate base (fcn,coo) as a function of pH can be determined. 
These values can be plotted directly (A) as a function of pH, or the 
logarithms of these values (B) can be plotted as a function of pH. 
When the pH is greater than the pK, by 2 units, the concentration 
of acetic acid decreases by a factor of 10 for each increase in pH of 
1 unit. When the pH is less than the pK, by 2 units, the concentra- 
tion of acetate ion decreases by a factor of 10 for each decrease in 
pH of 1 unit. 


Problem 2-4: Write the acid-base equilibrium to which 
each of the following values of pK, refer. Write each as a 
chemical reaction, and draw out the structures of the 
conjugate acid and the base in o-z stereochemical rep- 
resentation. In tables!” of values of pK,, the name of the 
compound and the value of the pK, are all that one is 
given, so it will be necessary for you to compare each of 
these values of the pK, with those for acids and bases 
about which you are certain to judge whether the pK, is 
for the molecule as named or for one of its conjugate 
bases or one of its conjugate acids. 
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compound pk, 
1-amino-2-bromoethane 8.49 
1-aminohexane 10.56 
2,2,2-trifluoroethanol 12.43 
ethanethiol 10.50 
3-hydroxypropyne 13.55 
diethylamine 10.98 
ethanol -2,16 
2-hydroxyethanethiol 9.5 
2-aminoethanethiol 8.6, 10.75 
2-chloroethanol 14.31 
morpholine 8.36 
2,2-dichloroethanol 12.89 
diallylmethylamine 8.79 
diethyl ether -3.5 
1-aminobutane 10.59 
2-hydroxyethanamine 9.50 
allylmethylamine 10.11 
pyrrolidine 11.27 
piperidine 11.22 
piperazine 5.68, 9.82 
pyridine 5.14 
imidazole 7.05, 14.52 
pyrimidine 1.10 
isoquinoline 5.14 
pyrazole 2.48 
aniline 4.62 
o-chloroaniline 2.62 
m-chloroaniline 3.32 
p-chloroaniline 3.81 
p-methylaniline 5.07 
p-methoxyaniline 5.29 
p-nitroaniline 1.02 
phenol 9.95 
p-(trimethylammonio)phenol 8.0 
o-chlorophenol 8.48 
m-chlorophenol 9.02 
p-chlorophenol 9.38 
p-methylphenol 10.19 
p-methoxyphenol 10.20 
p-nitrophenol 7.14 
2-aminobutanoic acid 2.27, 9.68 
N-ethylmorpholine 7.70 
1-aminonaphthalene 3.40 
2-thioethanesulfonate 7.5 

ethyl acetate 25 
1-chloro-2-propanone 16.5 
CH3COCH(C,Hs)CO.C,H; 12.7 
nicotine 3.13, 8.02 
p-hydroxyaniline 5.50, 10.30 
1-amino-2,2,2-trifluoroethane 5.7 
CH;C(NH)NH, 12.52 
trichloroacetic acid 0.65 
fumaric acid 3.03, 4.52 
thiazole 2.44 
methoxyacetic acid 3.53 
thiourea -0.96 


Problem 2-5: The following zwitterionic acid-bases are 
used widely for buffering solutions of protein.” Write the 
full structures of both the acid and the conjugate base for 
the acid-base equilibrium to which the pK, refers. In 
what range of pH would each of these acid-bases buffer? 
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buffer pk, 
N-(2-sulfoethyl)morpholine (MES) 6.1 
1,4-bis(2-sulfoethyl) piperazine (PIPES) 6.8 
N,N-bis(2-hydroxyethyl)-2-aminoethanesulfonic acid (BES) 7.1 
N-(3-sulfopropyl) morpholine (MOPS) 7.2 
N-(2-sulfoethyl)-2-amino-1,3-dihydroxy-2- 7.4 
hydroxymethylpropane (TES) 
1-(2-hydroxyethyl) -4-(3-sulfoethyl) piperazine (HEPES) 7.5 
1-(2-hydroxyethyl) -4-(3-sulfopropyl) piperazine (EPPS) 8.0 
N-[2-hydroxy-1,1-bis(hydroxymethyl)ethyliglycine (Tricine) 8.1 
N,N-bis(2-hydroxyethyl) glycine (Bicine) 8.3 
N-(3-sulfopropyl)-2-amino-1,3-dihydroxy-2- 8.4 
hydroxymethylpropane (TAPS) 
N-(2-sulfoethyl)cyclohexylamine (CHES) 9.3 


Why is HEPES more acidic than EPPS? 


Problem 2-6: From the following list, select the reason 
for the difference in pK, between the two molecules in 
each pair presented below. 

(A) hybridization 

(B) electronegativity 

(C) za donation 

(D) odonation 

(ŒE) zwithdrawal 

(F) owithdrawal-induction 

(G) aromaticity 


H 
O 
oH 
O 
6° “CH, CH3 
pK, = 8.05 pK, = 9.19 
$ © 
H N 
pKa = 11.2 pK, =5.2 
(CF3)3COH (CF3),HCOH 
p Ka = 5.4 pKa = 9.3 


® 
C2H5OH (CH3)3NC2H40H 
pK, = 16.0 pK, = 13.9 
JN, 
N” "NH nN? NH 
\=/ \—/ 
pK, = 9.51 pK, = 14.5 


m-nitroaniline p-nitroaniline 


pK, = 4.88 pK, = 6.16 
CHANO- CH4 
pKa = 10.3 pKa = 40 


Problem 2-7: What two effects in combination cause 
the pK, of the methyl ester of 2-methoxypropenoic acid 
(-3.37) to be 0.9 unit less than that of dimethylether 
Bay 


Problem 2-8: What are the exact pHs of the following 
solutions? 

10° M acetic acid 

10° M imidazolium acetate 

5 x 10° M sodium dihydrogen phosphate 

5 x 10° M aniline 

10° M pyridinium chloride 

10° M p-nitroanilinium chloride 

10° M morpholine 

5x 10° M sodium 2,2-difluoroethoxide 


Problem 2-9: Calculate the concentration of imidazo- 
late anion in a 0.1 M solution of imidazole at pH 9.52. 


Problem 2-10: Determine the molar concentrations of 
each species of the weak acids and weak bases in the fol- 
lowing solutions. 


solute and concentration pH 
0.4 M 1-aminobutane 6.5 
0.2 M 1-aminobutane 11.0 
0.05 M p-chlorophenol 12.0 
0.01 M p-chlorophenol 7.3 
0.01 M p-methylaniline 5.0 
0.001 M p-methylaniline 2.0 
0.03 M 2-aminoethanethiol 9.2 
0.08 M 2-aminoethanethiol 5.0 
0.05 M morpholine 3.5 
0.002 M piperazine 7.5 
0.03 M ethanol 6.4 
0.03 M diethyl ether 8.0 
0.03 M 3-hydroxypropyne 4.0 


Problem 2-11: From the following information calcu- 
late the pH of the final solutions. 


buffer species concentration amount of 
of buffer initial NaOH added 
(M) pH (mol LU" 
imidazole 0.1 6.70 0.02 
imidazole 0.03 6.50 0.02 
phosphate 0.01 6.80 0.005 
phosphate 1.0 6.35 0.1 
borate 0.2 9.50 0.002 
borate 0.15 8.40 0.05 
imidazole 0.1 6.50 0.01 
imidazole 0.05 7.00 0.02 
phosphate 0.2 7.20 0.05 
phosphate 0.3 6.20 0.15 
borate 0.05 9.40 0.001 
borate 0.02 8.60 0.01 


Tautomers 


One isomer is a tautomer of another isomer if the only 
difference between them is the position of a proton. 
There are several tautomers of uridine (2-9): 


(2-11) 


R = ribosyl 


Each of these three tautomers of uridine is a distinct mol- 
ecule with distinct chemical properties. It can be con- 
verted to another tautomer in the set by the removal of 
its acidic hydrogen by a base in solution, almost always a 
molecule of water (Reaction 2-1), and the readdition of 
another proton in the solution to another lone pair of 
electrons in the conjugate base. None of the interconver- 
sions between two of these tautomers can result from the 
intramolecular transfer of a proton because none of the 
lone pairs of electrons in the molecules is disposed prop- 
erly for such an intramolecular transfer. Because they are 
acid-base reactions and because the conjugate base 
common to all of them 
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is a stable, anionic molecule, these interconversions are 
rapid. In water, the tautomer of uridine that is normally 
written, the one in which the proton occupies the nitro- 
gen, is the dominant one, exceeding in concentration the 
other two combined by a factor of more than 4000.” 

By formal definition, two otherwise identical iso- 
mers are tautomers of each other only when the tau- 
tomeric proton sits on two different atoms in the two 
isomers, as in the case of the three tautomers of uridine 
in Equation 2-11. If the proton sits on two different lone 
pairs on the same atom, the two isomers are, by formal 
definition, conformational isomers of each other. An 
example of two such confomational isomers would be 
the syn and anti conformations of acetic acid: 


© © 
DC DC 
Hc. HC KX eaz 
OH OCH 
H 
syn anti 
Because the barrier to rotation around the 


carbon-oxygen bond is large” due to the conjugation in 
the acid (2-5) and because protons shuttle on and off the 
oxygens rapidly due to the aqueous solution, neither of 
these two conformational isomers probably exists long 
enough to convert to the other by rotation about the 
carbon-oxygen bond. If this is the case, each of these iso- 
mers over its lifetime is a distinct molecule, each has dis- 
tinct chemical properties, each is almost always 
converted to the other isomer only by the removal of its 
acidic hydrogen by a base in solution and the readdition 
of a proton to one of the two lone pairs of electrons that 
have the opposite orientation in the acetate anion 
(Figure 2-5), and the two isomers are in fact, if not by def- 
inition, tautomers of each other. 

In the gas phase, the syn isomer of acetic acid is 
about 20 kJ mol! more stable than the anti isomer.” 
This difference in stability results from steric repulsion 
between the methyl group and the proton,” from the 
repulsion between the dipole of the carbon-oxygen 
double bond and the dipole of the oxygen-hydrogen 
bond in the anti conformation, and from the fact that the 
unfavorable electron repulsion between the two syn lone 
pairs of electrons (Figure 2-5) is relieved when one of 
them is protonated but not when an anti lone pair is pro- 
tonated.” Although the high relative permittivity of 
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water should damp both the dipolar repulsion and the 
electron repulsion,” it has been proposed that the differ- 
ence in stability between these two tautomers is the same 
in water as in the gas phase.”' If this were the case, the 
microscopic acid dissociation constant for the anti iso- 
mer should be about 3000 times larger than that for the 
syn isomer; or in other words, the syn lone pairs of elec- 
trons should be 3000 times more basic than the anti. 
There is, however, experimental evidence suggesting 
that in water the difference in basicity between the syn 
and anti lone pairs is much less significant.” 

If the rotation about the carbon-oxygen bond in 
each of the two tautomers of uridine in which the oxy- 
gens are protonated (Equation 2-11) is also sufficiently 
hindered that neither interconverts significantly by rota- 
tion around the carbon-oxygen bond during its lifetime, 
then there would be syn and anti conformations of each 
of them that would be in fact two tautomers of each of 
them. In this case, the five actual tautomers of uridine 
would be the five molecules resulting from the protona- 
tion in turn of the five respective o lone pairs on anion 
2-13. As the protons shuffle, the o structure of the uri- 
dine remains constant, and a proton is simply found on 
a different o lone pair of electrons. 

In some sets of tautomers, however, rehybridiza- 
tion of the atoms in the acid-base occurs during tau- 
tomerization. Such rehybridization is required to take 
place when one of the lone pairs that is protonated is a 
alone pair in the intermediate base. The usually cited 
example of this is that of the keto and enol tautomers of 
a carbonyl compound such as ethyl acetoacetate: 


ol Ou Ken. OOO OoO 
E3 | Ht i 
C2H5 = N .. C2H5 
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The most stable form of the acid, the ketoester (KE), is in 
equilibrium with two tautomers, the enol at carbon 3 (E3) 
and the enol at carbon 1 (E1). The common conjugate 
base of all three of these tautomers is the enolate anion 
(enolate). The enolate anion has a five-atom system of 
z molecular orbitals, and each of the five atoms of the 
system is hybridized [p, sp*, sp*, sp]. Two of the six 
x electrons of the anion, however, must be protonated at 
carbon to form the ketoester, an event requiring rehy- 
bridization at that central carbon. 

In the case of the two enols of ethyl acetoacetate, in 
contrast to the tautomers of acetic acid or the tautomers 
of uridine, the proton can be readily transferred 
intramolecularly between the two oxygens. In fact, in 
either enol the proton forms a hydrogen bond to the 
adjacent carbonyl oxygen. These comparisons illustrate 
the specific geometric requirements for intramolecular 
proton transfer. Not counting the proton transferred, 
efficient intramolecular proton transfer requires that a 
ring of five or six atoms can be formed. 

There are three aspects of the situation that must 
be clearly distinguished from each other. One is the set 
of tautomers itself (Equations 2-11 and 2-13). The 
second is the resonance structures that can be written 
for each member of the set of tautomers. The third is the 
microscopic acid dissociations of the individual tau- 
tomers. 

Each of the tautomers in the set can often be drawn 
as a subset of resonance structures. For example, just 
one of the tautomers of uridine can be examined in this 
way: 


Aa) Loi 
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The resonance structures, as distinct from the tautomers 
themselves, are not independent molecules and do not 
have independent existences. In such a subset of reso- 
nance structures, as is always required, no o bond or 
o lone pair has engaged in the exercise because they are 
all orthogonal to the zelectrons that are being shifted. 
The resonance structures designate which electrons are 


the z electrons and which atoms—in the case of uridine, 
all of them—are contributing p orbitals to the system of 
m molecular orbitals. Each of the three tautomers of uri- 
dine can be submitted to this treatment to generate three 
subsets of resonance structures. It becomes clear that if 
the hierarchy of tautomers and resonance structures is 
not always clearly recognized, significant confusion 
ensues. 

Because it is a tautomer, any one of the tautomers 
in a set can simply lose a proton in a microscopic acid 
dissociation that produces its conjugate base. Although 
the conjugate base may itself be a member of a set of 
tautomers existing at its level of protonation, in the 
examples discussed so far, none of the conjugate bases 
have had acidic protons. For example, the enolate is the 
common conjugate base produced upon the dissocia- 
tion of a proton from any one of the three tautomers of 
ethyl acetoacetate (Equation 2-13). The ratios between 
the concentrations of each of the pairs of the members 
of a set of tautomers is independent of the pH of the 
solution because a proton appears on neither side of 
any chemical equation interconverting the two. For 
example, as the pH increases, the molar concentration 
of the enolate increases according to a function of the 
same form as that displayed for the conjugate base 
in Figure 2-6, and the sum of the molar concentrations 
of the three tautomers decreases accordingly, but the 
ratio between their concentrations remains unaltered 
at all values of pH, even when the conjugate base 
accounts for almost all of the molecules in the solution. 
In the case of ethyl acetoacetate, these ratios are 
defined by the three equilibrium constants among the 
tautomers: 


[KE] [KE] [E1] Kx 
Te "mer Ta = Tea] Ksi = TE] Kx 
(2-14) 


To treat this situation quantitatively, a distinction 
must be made between the microscopic dissociation 
constants and the macroscopic dissociation constant. 
The microscopic dissociation constants involved are 
those for the dissociation of each tautomer: 


ne [H*] [enolate] 


[H+] [enolate] 
aEl = K, = a 


[E1] 3 TE] 


[H+] [enolate] 
[KE] 
(2-15) 
In contrast to such relationships, a macroscopic acid 


dissociation constant is an acid dissociation constant in 
which all tautomers with the same number of protons 
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are considered to be indistinguishable. As a result, the 
molar concentrations of all tautomers with the same 
number of protons must be summed, and only those 
undivided sums can appear in the expression defining a 
macroscopic acid dissociation constant. The expression 
for the macroscopic dissociation constant of ethyl ace- 
toacetate is 


[H*] [enolate] 
[KE] + [E3] + [E1] 


= 2.1x 107 M 


aEAA 7 


(2-16) 


Were there more than one tautomer of the enolate, the 
molar concentrations of all these tautomers would be 
summed and that sum would be multiplied by [H*] in the 
numerator. 

It is the macroscopic pK, that is measured during 
the titration of an acid-base because such a measure- 
ment makes no distinction among all of the tautomers 
yielding a proton at a particular pH or among all of the 
tautomers produced upon the surrender of the proton. 
All that is measured is the consumption of hydroxide 
ions or protons by the solution. A tautomeric acid 
behaves as if it were a simple acid with an acid dissoci- 
ation constant equal to its macroscopic acid dissocia- 
tion constant. The total concentrations of conjugate 
bases and conjugate acids behave as if they were the 
concentrations of one simple base and one simple acid. 
Because only the macroscopic pK, is the result of an 
acid-base titration and because measurements of the 
ratios of tautomers or their microscopic acid dissocia- 
tion constants are more difficult, it is always the macro- 
scopic pK, that appears in a table. The tabulated value’? 
for the macroscopic pK, of ethyl acetoacetate (pKaraa) is 
10.68. 

By simple manipulation it can be shown that 


(2-17) 


This relationship demonstrates that the macroscopic 
acid dissociation constant is a function of all of the 
microscopic acid dissociation constants, in particular if 
the magnitudes of the microscopic acid dissociation 
constants are all similar to each other. In the case of 
ethyl acetoacetate, however, the ketoester is much less 
acidic than the enols (rs Karz < Karı). AS a result, 
Kapaa = Ky. 

It is possible to calculate a microscopic acid disso- 
ciation constant from the macroscopic dissociation con- 
stant and the equilibrium constants among the 
tautomers (Equations 2-14). For example, Equation 2-17 
can be rearranged and Equations 2-14 and 2-15 can be 
used to obtain 
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K, 


a 


£3 = Kypaa(1 + Kok + Kay) (2-18) 


If it is assumed that the enol at carbon 3 is the more 
stable (K3, < 1) and that the measured equilibrium con- 
stant between the enols and the ketoester (250)'! is 
approximately Ka, then the pK, for the microscopic 
acid dissociation of the enol at carbon 3 is approxi- 
mately 8.3. 

As Equation 2-18 suggests, the ratios among the 
tautomers can also be calculated from their microscopic 
acid dissociation constants. In fact, all of the equilibrium 
constants governing the tautomers and the conjugate 
base of ethyl acetoacetate are dependent upon each 
other, or linked. The linkage is reflected in the relation- 
ships 


Karı Kr Kr 
Kr Kx=7 Kı=7 

aKE aKE aEl 

(2-19) 


The equalities of Equations 2-19 simply state that the 
ratio between the concentrations of any two tautomers is 
equal to the inverse ofthe ratio of their respective micro- 
scopic acid dissociation constants, which makes chemi- 
cal sense. The stronger the bond between the 
heteroatom and the proton, the smaller will be its intrin- 
sic acid dissociation constant but the greater its relative 
concentration. 

A molecule of protein has a large number (>100) of 
acidic protons and basic lone pairs distributed over the 
side chains of its amino acids. As a result it is a waste of 
time even to imagine all the tautomers of that protein 
that are present in solution, but usually these 
acid-bases on the side chains are separated widely 
enough from each other that each behaves as an inde- 
pendent acid-base and can be treated as such. 
Occasionally, however, two or three amino acids are not 
only of functional significance, so attention is paid to 
them, but also close enough to each other that tau- 
tomers and the distinction between macroscopic acid 
dissociation constants and microscopic acid dissocia- 
tion constants become important.” In thioredoxin from 
Escherichia coli, there is an aspartate (Aspartate 26) 
close enough in the structure to a cysteine (Cysteine 32) 
that their acid dissociations become linked.” When 
both are protonated or both are unprotonated, there 
are no tautomers; but when one is protonated and the 
other is not, there are two tautomers, one in which the 
proton is on the aspartic acid and the cysteinate is in 
the form of the anionic base, and the other in which the 
proton is on the cysteine and the aspartate is the 
anionic base: 
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The equilibrium constant for the tautomerization (Kso) is 
defined with the thiol-carboxylate as product and the 
thiolate-carboxylic acid as reactant. The linkage rela- 
tionships are 


[HSO] Kayson Kason 
Kso = = T 


and the relationships between the macroscopic dissoci- 
ation constants and the microscopic dissociation con- 
stants are"! 


_ ([Hs07] + [sOH])IH*] 


K, aHosH + Kayson 


a [HSOH] 
(2-22) 
and 
1 _ [HSO7] + [SOH] sel l 
Kaz [SO] [H*] Kosu K,-soH 
(2-23) 


The equation describing the titration curve for the 
cysteine is 


Kanosu ( [H*] + K.-son) 
Kanosu ([H*] = Kyson) +H") ( [H*] + K,uson) 
(2-24) 


f cysteinate 7 


where foysteinate iS the fraction of the cysteine that is the 
anionic base. It is possible to walk through the titration 


curve. Assume that the first and second macroscopic 
acid dissociation constants are well separated, that the 
respective pairs of microscopic acid dissociation con- 
stants are close together (Knosu = Kanson > Konen = 
K,-sou), that the initial pH is low, and that the titration is 
performed by adding hydroxide ion. As the concentra- 
tion of protons decreases into the range of the first 
macroscopic acid dissociation constant, the inequality 
[H"] = Kanosu = Kanson > Ke-osu = Kason holds and 


Kanosu 


I cysteinate = ( (2-25) 


Kanosu + K,usoH ) + [H*] 


This equation describes a normal titration curve for a 
conjugate base (Figure 2-6A) with a macroscopic acid 
dissociation constant equal to Kanosu + Kansow the sum 
of the two lower microscopic dissociation constants, 
which is the macroscopic dissociation constant K}, and 
that reaches a plateau at 


Kanosu 


Íeysteinate = Kaos t Kason 
Kos u 1 

1+ Kso 

(2-26) 


K,-osu + Kason 


which is the fraction of cysteinate in the tautomeric mix- 
ture. The plateau is reached when Knosu = Kanson > [H"] 
> Karson = Ka-osH- 

As [H"] is decreased further during the titration into 
the range of the second macroscopic dissociation con- 
stant and on above it, the inequality Knosu = Karıson > 
K,-osH = Ky-soH = [H holds and 


Koso 

Ke + [H* 
f — Keosu + Keson (Karson + UTI) 
cysteinate dä F Kosa Kon 


Kosu + Kason 
(2-27) 


which is the equation for a normal titration curve begin- 
ning at the tautomeric fraction (Equation 2-26), having a 
macroscopic acid dissociation constant equal to the term 
Ka-souKa-osu(Ka-osu + Kon which is the macroscopic 
dissociation constant Ka, and reaching a final level at 
which all of the cysteine is unprotonated (the fully ion- 
ized form on the right of Equation 2-20). 

The titration curve for the cysteine of thioredoxin 
conforms to these expectations.” The values 
observed for the macroscopic acid dissociation con- 
stants are pK,, = 7.2 and pK, = 9.5 and the tautomeric 
ratio, as determined by the level of the plateau observed 
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in the titration curves (Equation 2-26), is 1.3. These 
values, when inserted into the equations, give micro- 
scopic acid dissociation constants for the cysteine and 
the aspartic acid of pKayosq = 7.6, PKanson = 7.5, 
PK,-osu= 9.2, and pK,-son = 9.1. The titration curve for the 
aspartic acid has tautomeric ratios and macroscopic acid 
dissociation constants of about 1.3, 7.2, and 9.5, as 
expected.” 

From the microscopic acid dissociation constants, 
it can be seen that when the aspartic acid is protonated, 
the thiol is a much better acid (pK,yosq = 7.6) than 
when the aspartate is unprotonated and anionic 
(pK,-osu = 9.2). Because of the linkage, the same differ- 
ence in microscopic acid dissociations is necessarily 
seen for the aspartic acid (ApK, = 1.6) when the cysteine 
is the neutral thiol or the anionic thiolate. These differ- 
ences make electrostatic sense because it should be 
significantly more difficult to produce two adjacent 
negative charges than a single negative charge. For 
example, the pK, of the first macroscopic acid dissocia- 
tion of succinic acid is 1.29 units less than that of the 
second. Before the ionization of these two acid-bases in 
thioredoxin was analyzed in terms of tautomeric equi- 
libria and microscopic acid dissociation constants,’ 
there was considerable confusion as to what was hap- 
pening2°37.38 

Glutamate 172 and Glutamate 78 in the endo-1,4- 
B-xylanase from Bacillus circulans are close enough to each 
other in the native protein to be linked by a tautomeric 
equilibrium.” The microscopic pK, of Glutamate 172 
when Glutamate 78 is the neutral acid is 5.5, but when 
Glutamate 78 is the anionic carboxylate, it is 6.7. 


Problem 2-12: 
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guanosine xanthosine cytidine 


R= ribose 


(A) Draw complete o structures for the above hetero- 
cycles in the above tautomeric forms including all 
o lone pairs. Draw them with proper bond angles. 
Abbreviate the ribose as R. 


(B) Indicate which protons are involved in tautomeric 
shifts between which lone pairs of electrons. Draw 
some of the tautomeric forms of these neutral 
molecules. 


(C) How many r7 electrons are there in each com- 
pound? 


(D) The macroscopic values of pK, for guanosine are 
1.6, 9.2, and 12.5; those for xanthosine are 0.0, 5.5, 
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and 13.0; and those for cytidine are 4.2 and 12.5. 
Draw vertically chemical equations for the two or 
three acid dissociations that have these values of 
pKa and horizontally next to the molecule in each 
level of protonation draw two of its tautomers. 


How many of the tautomers at each level of pro- 
tonation are insignificant because they require 
separation of charge? 

Draw the o structure of a tautomer of xanthine 
that could substitute for adenine in the A-T base 
pair. 


(E) 


(F) 


Problem 2-13: Derive Equation 2-17. 


Problem 2-14: If 


[SOH] + [S07] 
[HSOH] + [HSO°] + [SOH] + [S07] 


f cysteinate — 


where the four species are as labeled in Equation 2-20, 
derive Equations 2-22, 2-23, 2-24, 2-25, 2-26, and 2-27. 
The values observed for the macroscopic acid dissocia- 
tion constants for the tautomeric equilibrium of Equation 
2-20 are pK,ı =7.2 and pK,.=9.5 and the tautomeric ratio 
is 1.3. Calculate the values of the four microscopic disso- 
ciation constants. 


Problem 2-15: Write a set of linked equilibria resem- 
bling the one in Equation 2-20 for the tautomeric equi- 
librium and four microscopic acid dissociations of 
1,5-dimethyl-4-mercaptoimidazole. The two macro- 
scopic acid dissociations of 1,5-dimethyl-4-mercaptoim- 
idazole have values of pK, equal to 2.3 and 10.3. Assume 
that the microscopic acid dissociation constant between 
1,5-dimethyl-4-mercaptoimidazolium cation and its 
neutral conjugate base has the same value as the macro- 
scopic acid dissociation constant for S-methyl-1,5-di- 
methyl-4-mercaptoimidazole (pK, = 6.0), and calculate 
values for the other three microscopic acid dissociation 
constants and the tautomeric equilibrium constant for 
1,5-dimethyl-4-mercaptoimidazole. At pH 7, what is the 
major form of the molecule present in the solution? 


Problem 2-16: The macroscopic acid dissociation con- 
stants for succinic acid are 10*! and 10°“, What are its 
four microscopic acid dissociation constants as values of 
DK? 
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The fundamental, covalent scaffold of a molecule of pro- 
tein is the polypeptide (see 2-15 below). 

A polypeptide is a long (50-5000 amino acids) 
linear polymer, the monomers of which are L-a-amino 
acids. Because a protein constructed entirely of 
D-a-amino acids is functionally indistinguishable from 
its biological enantiomer,’' the original choice of 
L-a@-amino acids was arbitrary. The covalent bonds that 
link the amino acids together to form a polypeptide, 
which are referred to as peptide bonds, are those of 
amides. Because a molecule of water is lost between two 
amino acids when a peptide bond is formed, the amino 
acids, when they are incorporated into a polypeptide, 
should be referred to as amino acid residues. Every 
polypeptide has the same backbone of peptide bonds 
and acarbons with an end at which an unbonded pri- 
mary amine is usually located, the amino terminus, and 
an end at which an unbonded carboxylic acid usually is 
located, the carboxy terminus. At the values of pH usu- 
ally encountered in living organisms (pH 7-8), the amino 
terminus (pK, = 8.0)” should be partially protonated and 
cationic, and the carboxy terminus (pK, =3.3)'* should be 
unprotonated and anionic. The rhythm of a polypeptide 
is N, Ca, CO, N, Ca, CO, N, Ca, CO. 

In a polypeptide, the amino acid residues each con- 
tribute a side chain (R; in 2-15) to the covalent structure. 
It is the order in which these side chains appear along the 
polymer that defines the protein. Conceptually, the con- 
tribution of an amino acid residue to the structure of the 
protein can be divided into its a-imido nitrogen, its 
a carbon, and its a-acyl carbon and oxygen on the one 
hand and its side chain on the other. The former always 
provide the same six atoms (with the minor exception of 
a carbon for an a hydrogen in the case of proline) to the 
backbone of the polypeptide. The structure of this back- 
bone can be treated as if it were an independent mole- 
cule, albeit a long tortuous polymer, and the six atoms 
contributed to it by each amino acid as if they were sep- 
arate from each side chain. The side chains themselves 
can be treated as separate entities by detaching them, in 
the imagination, from their respective œ carbons and 
replacing what was the bond to the «carbon with a 
hydrogen. In this way, a model compound for the amino 
acid side chain is created.” 

A model compound for a particular amino acid 
residue in a polypeptide would be a small molecule that 
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incorporates the structure of the side chain and any addi- 
tional structural elements necessary to duplicate the 
properties of the amino acid residue that are of interest. 
The model compounds in which the «carbon is 
replaced only by a hydrogen are simple, readily available 
chemicals. For example, in this set, the model compound 
for glutamic acid would be propionic acid; that for histi- 
dine, 4-methylimidazole. Another set of model com- 
pounds that has been used is the N-acetyl a-amides of 
the amino acids.” N-Acetylaspartic acid a-amide and 
N-acetylglutamic acid &-amide (Figure 2-1) are mem- 
bers of this set. 

Unfortunately, the free amino acids themselves are 
poor model compounds for the amino acid residues in a 
polypeptide. This arises from the fact that, at all values of 
pH, they are either zwitterionic or bear net charge. Their 
solubilities, acid-base behavior, and ability to participate 
in noncovalent interactions are dominated by the car- 
boxylate anion and ammonium cation they contain. 
Contrary to original expectations, an understanding of 
the properties of proteins depends little on an under- 
standing of the properties of the amino acids themselves, 
while an examination of the structures of the amino acid 
side chains and the behavior of uncharged model com- 
pounds for them does provide essential information. 

One of the more important properties of an amino 
acid residue incorporated in a polypeptide is the value 
of the pK, of its side chain. The amino acid side chains 
that contain acid-bases are those of aspartate, 
asparagine, serine, threonine, glutamate, glutamine, 
cysteine, tyrosine, histidine, lysine, tryptophan, and 
arginine (Table 2-2). The N-acetyl &-amides of gluta- 
mate and aspartate have been useful in examining the 
electronic effects of the peptide bonds that surround 
the æ carbon on the acid dissociation constants of the 
amino acid side chains in a polypeptide." The o carbon 
in an N-acetyl a-amide, which is transmitting induc- 
tively the significant electron-withdrawing capacity of 
both the carboxamido and the acylimido groups that 
are attached to it, seems to have about the same 
electron-withdrawing capacity as a hydroxyl, a 
cyanomethyl, a chloromethyl, or a bromomethyl group. 
This conclusion follows from the fact that replacing the 
acarbon of N-acetylaspartic acid -amide or N-acetyl- 
glutamic acid a-amide with any of these functional 
groups produces little change in the pK, of the respec- 
tive carboxylic acid, but the completely aliphatic model 
compounds, acetic acid and propionic acid, respec- 
tively, are significantly less acidic than the respective 
N-acetyl a-amides (Table 2-2). These four more 
common electron-withdrawing substituents can be 
used to estimate the inductive effect of the polypeptide 
on the acid dissociations of the various acid-bases on 
other amino acids (Table 2-2). 

These values of pK, for the side chains have been 
shown to be accurate when the amino acid is in a 
polypeptide if the polypeptide is in the form of a struc- 
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Table 2-2: Acid Dissociation Constants for Model 
Compounds for the Amino Acid Residues” 


amino acid model compound pk, 
residues 
aspartic acid in polypeptide (estimate)? 4.0 
N-acetylaspartic acid o-amide" 4.0 
Gly-Gly-Asp-Gly-Gly”* 3.9 
3-chloropropionic acid” 4.1 
hydroxyacetic acid’? 3.8 
3-bromopropionic acid” 4.0 
3-cyanopropionic acid” 4.0 
acetic acid’? 4.75 
glutamic acid in polypeptide (estimate) 4.3 
N-acetylglutamic acid «-amide” 4.3 
Gly-Gly-Glu-Gly-Gly“ 4.1 
glutamic acid (microscopic pK, neutral)” 4.5 
4-chlorobutyric acid!” 4.5 
3-hydroxypropionic acid” 4.5 
4-bromobutyric acid” 4.6 
4-cyanobutyric acid” 4.4 
propionic acid” 4.9 
serine in polypeptide (estimate) -3, 14.2 
2-chloroethanol”” 14.3 
2-bromoethanol'? 14.4 
2-cyanoethanol” 14.0 
ethanol!!1245 2, 15.9 
methanol’? 15.5 
threonine in polypeptide (estimate) -3,15 
cysteine in polypeptide (estimate) 8.7 
glutathione” 8.7 
cysteine (microscopic pK, neutral)“ 9.1 
ethanethiol” 10.5 
2-mercaptoethanol"? 9.5 
ethyl mercaptoacetate!? 8.0 
tyrosine in polypeptide (estimate) 9.8 
polytyrosine“® 9.5 
tyrosine (microscopic pK, neutral)” 9.8 
4-(hydroxymethyl) phenol” 9.8 
phenol!” 10.0 
4-methylphenol” 10.2 
histidine in polypeptide (estimate)” 6.6, 14 
Pro-His-glycinamide”’ 6.4 
histidine hydantoin?! 6.4 
histidine (microscopic pK, neutral)” 6.0 
N-acetyl-L-histidine methylamide™ 6.5 
Gly-Gly-His-Gly-Gly”” 6.7 
Gly-His-Gly”” 6.6 
imidazole!” 7.1, 14.5 
4-methylimidazole” 7.5 
lysine in polypeptide (estimate) 10.5 
Gly-Gly-Lys-Gly-Gly” 10.5 
1-amino-5-hydroxypentane'? 10.5 
Ala-Lys-(Ala), (n=1,3)" 10.5 
1-aminopentane” 10.6 
glutamine in polypeptide (estimate) -1,17 
acetamide” -0.7, 17 
asparagine in polypeptide (estimate) -1.4, 16 
arginine in polypeptide (estimate) 13 
N-methylguanidine” 13.4 
tryptophan in polypeptide (estimate) 17 
indole” 16.9 


“The values for the pK, of the model compounds are from the noted sources for 
25°C. ’The estimate for each amino acid is based entirely on the values tabulated 
for the model compounds. 
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tureless random coil” and if that amino acid does not 
have an immediate neighbor in the polypeptide with an 
ionized side chain. When the polypeptide folds to form a 
globular protein, however, significant shifts in the values 
of pK, for its side chains occur PHI Neighboring 
charged functional groups in the compact folded struc- 
ture can affect the pK, of a particular acid-base. An adja- 
cent anion makes it harder to remove a proton from an 
acid and raises its pK, (Equation 2-20). An adjacent ele- 
mentary positive charge makes it easier to remove a 
proton from an acid and lowers its pK}. If, upon the fold- 
ing of the protein, the acid-base finds itself in an aprotic 
environment, secluded from water, the more charged 
form of the acid-base will be less stable relative to the 
less charged form than it would be in water. This shifts 
the pK, in the direction favoring the less charged form of 
the acid-base. A simple paradigm for such an effect 
would be the shift in the pK, of acetic acid in dimethyl 
sulfoxide, a relatively polar but aprotic solvent, to 12.9 
from its value of 4.75 in water, which occurs because the 
anionic conjugate base is poorly solvated by the 
dimethyl sulfoxide relative to the solvation provided by 
water. For all of these reasons, when the polypeptide 
folds to form the native structure, the values for the pK, 
of the various amino acids shift away from their ideal 
values. 

Alanine (A, Ala), valine (V, Val), leucine (L, Leu), 
and isoleucine (I, Ile) have unsaturated alkyl groups as 
side chains:* 
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leucine 


All of their carbons are hybridized sp*. Because alkyl 
groups are sterically more bulky than functional groups 
containing atoms hybridized [p, sp’, sp*, sp?], steric 
considerations are more important in examining the 


* The drawings of all of the side chains in this section, except for 
proline, are for the entire functional group that is attached through 
a carbon-carbon bond to the respective a-carbon in the backbone 
of the polypeptide. The open bond in each drawing indicates the 
point of this attachment. 


structures of these four amino acids than with most of 
the others. The view down every carbon-carbon bond 
should be staggered, and methyl or other alkyl groups 
should be anti to each other in the most stable con- 
formers. 

Proline (P, Pro) and glycine (G, Gly) are amino acid 
residues the effect of which on the polypeptide is almost 
entirely steric. Glycine has no side chain at all, merely a 
hydrogen, and as such can occupy positions in the native 
structure of a protein that are cramped. A proline, 
because it is a ring 
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forces the polypeptide to assume particular orientations. 
Phenylalanine (F, Phe) is aromatic by virtue of its 
phenyl ring: 


The six z electrons are delocalized above and below the 
plane of the ring in three bonding molecular orbitals over 
the six carbons that contribute the six p orbitals. This 
causes the o structure of the ring to be planar, and it is 
sandwiched between two circular clouds of z electrons. 
A phenylalanine side chain absorbs ultraviolet light 
(Amax= 253 nm, e=1550M" cm"), and its absorption 
spectrum displays the usual fine structure seen in 
unadorned alkylbenzenes. 

The side chain of tryptophan (W, Trp) is an indole, 
which is a benzopyrrole. The indole is entirely aromatic, 
consisting of an unbroken ring of nine atoms each con- 
tributing a p orbital, and the aromatic system of x mole- 
cular orbitals contains 10 z electrons: 


The hydrogen on the pyrrole nitrogen of indole 
(pK,= 17.0)"* is even less acidic than the hydrogen on a 
molecule of water (pK, = 15.7): 


When it departs as a proton, the lone pair left behind is 
an sp*lone pair, and the negative formal charge can be 
distributed by the system of z molecular orbitals over all 
nine atoms in the ring. This delocalization is greater in 
extent than the delocalization available to pyrrole, which 
is somewhat less acidic (pK, = 17.5) than indole 
(pK,= 16.9).'* The indolyl group of tryptophan is planar 
with hydrogens directed outward along its edge and 
clouds of z electrons above and below the o plane. It has 
the strongest ultraviolet absorption of any functional 
group in an amino acid (Ama; = 281 nm, £ = 5690 M! 
cm)?” and is the principal contributor to the 
absorbance of protein at 280nm (Figures 1-6 and 
1-10). 

The side chains of serine (S, Ser) and threonine (T, 
Thr) are primary and secondary aliphatic alcohols 
resembling ethanol and 2-propanol, respectively, except 
that they are more acidic because of the electron with- 
drawal of the immediately adjacent polypeptide. Their 
oxygens are hybridized sp’ and have two o lone pairs that 
can act as bases as well as an acidic hydrogen: 
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(2-29) 


The values of pK, for these acid-base reactions can be 
estimated (Table 2-2) from a series of alcohols contain- 
ing appropriate electron-withdrawing substituents. 

Tyrosine (Y, Tyr) resembles phenylalanine because 
it is aromatic and serine because it has a hydroxyl group. 
As a phenol, however, its properties are distinct from 
either. Tyrosine (pK, = 9.7) is more acidic than serine 
(pK,= 14.2) because of the ability of the neighboring 
m system to delocalize the excess electron density of the 
anion: 
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The six p orbitals from the ring and the one p orbital from 
the exocyclic oxygen that overlap in the anion are dis- 
tributed above and below the plane of the ring. As indi- 
cated schematically in 2-23, the two o lone pairs on the 
oxygen of the anion are in the plane of the ring at angles 
of 120° to the carbon-oxygen bond and are the only 
bases on the anion that associate with a proton. To the 
extent that one of the lone pairs of electrons on the con- 
jugate acid is delocalized, 


(2-30) 


the lowered pK, of the hydroxyl reflects the lowered pKa 
of an sp’ oxygen-hydrogen bond. To the extent that the 
lone pair is not so delocalized in the conjugate acid as it 
is in the anion, the lowered pK, reflects the stability of the 
anion relative to the neutral acid resulting from the abil- 
ity of the system of zmolecular orbitals to spread its 
excess electron density over one oxygen and three 
carbons. Because of this increase of delocalization in 
the anion, a significant change in the ultraviolet 
spectrum of a tyrosine side chain occurs when the acid 
(Amax= 275 nm, €= 1410 MT cm”) becomes the conjugate 
base (Amax = 293 nm, £= 2380 M! cm") upon acid disso- 
ciation.” It is for this reason that proteins absorb more 
light at 280 nm when the pH of the solution is raised. 

The side chain of histidine (H, His) is also an aro- 
matic acid-base by virtue of its imidazolyl group 
(Equation 2-31): 
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The neutral imidazolyl group is an aromatic heterocycle 
containing one pyrrole nitrogen and one pyridine nitro- 
gen, which together contribute three valence electrons to 
an aromatic ring formed from five p orbitals. Six z elec- 
trons are located in the three bonding molecular orbitals 
of the aromatic system. The six z electrons remain in this 
aromatic system of z molecular orbitals at all times in all 
three protonation states but fluidly redistribute in 
response to changes in coulomb effects as the nitrogens 
gain or lose protons at their o lone pairs. 
The anion of the parent compound, imidazole 


2-24 


is the best place to begin. All atoms are hybridized [p, 
sp’, sp’, sp”), the two nitrogens each have a o lone pair 
of electrons located in the plane of the ring, and both 
are electronically equivalent. The excess electron den- 
sity associated with the formal negative charge is dis- 
tributed by the zsystem over both nitrogens, and 
resonance structures can be drawn to show this 
(Equation 2-31). The first proton adds to one of the two 
olone pairs in the imidazolate anion to form an 
sp’ covalent bond in an acid-base reaction with a 
macroscopic pK,2 = 14.5. Thus, the imidazolate anion is 
less basic than the pyrrolate anion (pK, = 17.5)” 
because its system of mmolecular orbitals can spread 
the excess electron density over two nitrogens, but the 
adenosinate anion (pK, = 12.5) is less basic than the 
imidazolate anion because its system of m molecular 
orbitals can spread the excess electron density over 
three nitrogens. 

In neutral imidazole, the two nitrogens are neces- 
sarily nonequivalent because one has a proton attached 
to it. The proton causes its nitrogen to be more elec- 
tronegative, and the lobes of the z molecular orbitals at 
this location swell accordingly. This is represented in the 
resonance structures by placing a z lone pair of electrons 
over this nitrogen, but such formalism is not meant to 
imply that this becomes a basic position or that this 
nitrogen rehybridizes or that the ring is no longer aro- 
matic. The only base on a neutral imidazole is the o lone 
pair on the other nitrogen, and it gains a proton in an 
acid-base reaction with a macroscopic pK, = 6.6 when 
the base is a histidine side chain in a polypeptide (Table 
2-2) or pKa = 7.1 when the base is imidazole itself. The 
imidazolium cation (pK,, = 7.1) is less acidic than the 
pyridinium cation (pK, = 5.2) because its system of 
m molecular orbitals can spread its electron deficiency 
over two nitrogens. 

The two ring nitrogens in a histidine side chain, 
unlike the two in imidazole, are not stereochemically 
equivalent to each other because of the substitution at 
carbon 4. The behavior, as a function of pH, of the chem- 
ical shifts of the nuclear magnetic resonances of the var- 
ious carbon-13 nuclei of the imidazole ring in histidine 
has been compared to their behavior in 1-methylhisti- 
dine and 3-methylhistidine. It was concluded from these 
observations” that in aqueous solution the ratio 
between the two neutral tautomers (Equation 2-31), 
1-protiohistidine and 3-protiohistidine, is 4:1. Assume 
that the same ratio obtains for histidine in a polypep- 
tide,” and let H,* be the cation of a histidine side chain 
in an unfolded polypeptide and 1-H and 3-H be the two 
tautomers (Equation 2-31). If 
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and 
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then the ratio of the two microscopic dissociation con- 
stants is 4 as it should be.” If the macroscopic dissocia- 
tion constant” 
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If a histidine in a protein is fully accessible to the aque- 
ous phase, either one of its two protons can dissociate, 
and the imidazolium cation of that histidine side chain 
has two microscopic acid dissociation constants, 
Danz 6.7 and pm = 7.3. All of these relationships 
can be presented graphically (Figure 2-7). 

It is important to distinguish carefully between the 
use of the macroscopic and microscopic acid dissocia- 
tion constants. If the imidazolium of a histidine side 
chain is on the surface of a molecule of protein and free 
to rotate around the carbon-carbon o bonds connecting 
it to the polypeptide so that both nitrogen-hydrogens 
can participate freely in acid dissociations, the macro- 
scopic pK, will dictate its acid-base behavior as it did in 
the model compounds. If the imidazolium is held in 
place by the neighboring amino acids in such a way that 
the surroundings have the same polarity as water and 
such that one ofits acidic hydrogens is always engaged in 
a hydrogen bond with an acceptor that resembles water 
closely, the single remaining site available for acid disso- 
ciation would display its respective microscopic acid dis- 
sociation constant.” A decision to use the macroscopic 
pk, or the microscopic pK, implies that the respective 
situation has been assumed to occur. 

The side chains of aspartic acid (D, Asp) and glu- 
tamic acid (G, Glu) are simple carboxylic acids (Figure 
2-5 and Table 2-2). The side chains of asparagine (N, 
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Figure 2-7: Titration curves for the three conjugate acids of histi- 
dine (Equation 2-31): H,', 1-H, and 3-H. The titration curves for 
histidine graphically illustrate the equations (Equations 2-32 to 
2-37) governing the tautomerization. The ratio of the two tau- 
tomers remains constant over the entire range. The value of each 
microscopic pK, is defined by the intersection between the curve 
representing the concentration of the respective tautomer and the 
curve representing the concentration of its nontautomeric conju- 
gate base or conjugate acid. As a result, the microscopic acid dis- 
sociation constant for a reaction producing a tautomer from a 
nontautomer is always less than the corresponding macroscopic 
acid dissociation constant, and the microscopic acid dissociation 
constant for the reaction in which a tautomer dissociates to form a 
nontautomer is always greater than the corresponding macro- 
scopic acid dissociation constant. The pK, for each of the two 
macroscopic dissociations coincides with the pH at which half of 
the histidine is in the form of the nontautomer, the H,* cation or the 
H anion, respectively. 


Asn) and glutamine (Q, Gln) are the primary amides of 
these two carboxylic acids. Primary amides participate in 
two acid-base reactions: 


au H^ NH HH 
HH Ka HAR Ke HAH 


Tye Tgk 
Be Ss sy FH Gucsvë 
H 


© 
S 
IZ 


H H H H H H 
Soe zw Qor zwä ©o7 zwë 
H H © H © H 
glutamine 

(2-38) 


As is the case with imidazole, two protons are removed 
successively from two heteroatoms separated by one 
carbon and connected to each other by a system of 
z molecular orbitals. The acid dissociations also proceed 
from a cation through a neutral form to an anion. The 
values for the two acid dissociation constants, however, 
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are more widely separated from each other (pK, = -0.7 
and pK» = 17) than the two for imidazole (pK,, = 7.1 and 
pK, = 14.5). 

The side chain of arginine (R, Arg) contains a 
cation 
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that slightly resembles the protonated amides of gluta- 
mine and asparagine but has a system of m molecular 
orbitals larger by one atom, a nitrogen: 


atomic 
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molecular 
orbitals 


2-26 


The functional group in arginine is a guanidinium cation, 
and it is composed from four atoms, three nitrogens and 
a central carbon, in the shape of a Y. Each of the four 
atoms contributes one p orbital, and the four mix to pro- 
duce the four z molecular orbitals, one bonding (y) and 
two nonbonding (y and y) molecular orbitals shown, 
as well as a fourth antibonding orbital (y4) not shown. 

The guanidinium cation has six z electrons distrib- 
uted in pairs among the bonding molecular orbital and 
the two nonbonding molecular orbitals above and below 
the plane of the ring. The two highest occupied non- 
bonding molecular orbitals are responsible for distribut- 
ing two pairs of electrons evenly over the three nitrogen 
atoms as is described by the resonance structures of 2-25. 
This causes the one elementary positive charge to be 
divided evenly among the three nitrogens. An arginine 
side chain (pK, = 13) is less acidic than a histidinium side 
chain (pK, = 6.4) because the elementary positive charge 
is distributed by the system of z molecular orbitals over 
three nitrogens rather than over two. 

The guanidinium of an arginine side chain (2-25) 


defines a plane in which its central carbon, three nitro- 
gens, five hydrogens, and the ô carbon all reside. The 
hydrogens bristle from the three nitrogens at 120° angles 
around the periphery, and the flat clouds of z electrons 
sandwich the ostructure from above and below. The 
entire structure bears a net elementary positive charge 
that is neutralized by removing a proton from any one of 
the three nitrogens. 

The side chain of lysine (K, Lys), the other strongly 
basic amino acid side chain (pK, = 10.5), is a simple pri- 
mary ammonium cation at neutral pH: 
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With four carbons, it has the longest linear alkane chain 
among the amino acids. The conformation of lowest free 
energy is all anti as shown. The introduction of a gauche 
conformation at any of the carbon-carbon bonds 
requires about 4 kJ mol" standard free energy. 

The side chain of cysteine (C, Cys) is fairly acidic 
(pK, = 8.7). The thiolate anion that results from the acid 
dissociation (Figure 2-8), although it is not a strong base, 
is a strong nucleophile because sulfur is an element of 
the third row. The side chain of methionine (M, Met), 
although it contains a thioether, resembles in its proper- 
ties the side chains of the amino acids that are purely 
alkanes, but it is linear rather than branched. The sulfur 
of methionine is large and electron-rich but not very 
basic; the pK, for its conjugate acid’ is about -9. 
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Methionine is, however, nucleophilic. At low pH, only 
methionine and cysteine react with alkylating elec- 
trophiles.” 

A drawback of methionine and cysteine, both to a 
protein in its normal environment and when the protein 
is studied in the laboratory, is the susceptibility of the 
sulfur they contain to oxidation by reaction with molec- 
ular oxygen, peroxides, or other oxidants. These reac- 
tions produce, in addition to disulfides, various oxides of 
sulfur (Figure 2-8). These are sulfoxides and sulfenic 
acids, sulfones and sulfinic acids, and sulfonic acids. To 
understand the bonding in these various products, the 
best way to begin is to examine sulfate: 
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Figure 2-8: Products of the oxidation of cysteine and methionine side chains and their conjugate acids and bases. When a cysteine side chain 
is oxidized by the removal of two electrons, the sulfenic acid is formed, and when a methionine side chain is oxidized by the removal of two 
electrons, the sulfoxide is formed. One of the tautomers of a sulfenic acid is the lower homolog of a sulfoxide. Cystine is the disulfide of two 
cysteines formed either by their direct oxidation or by the reaction of the sulfenic acid of one cysteine with the thiol of another cysteine. When 
a cysteine side chain is further oxidized by the removal of two more electrons, the sulfinic acid is formed, and when a methionine side chain 
is further oxidized by the removal of two more electrons, the sulfone is formed. One of the tautomers of a sulfinic acid is the lower homolog 
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of a sulfone. Cysteine can be further oxidized by the removal of two more electrons to produce the sulfonic acid. 
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The sulfate dianion is perfectly tetrahedral so sulfur is 
hybridized [sp?, sp’, sp’, sp] to provide the atomic 
orbitals that overlap to produce the four o bonds. Every 
sulfur-oxygen bond is the same length so each oxygen 
must be electronically identical. The sulfur-oxygen 
bonds are quite short so they must possess double-bond 
character. Sulfur has expended its sand p orbitals on the 
four o bonds, but as an element ofthe third period, it has 
vacant, accessible 3d orbitals that can be involved in 
overlap with adjacent 2p orbitals on the oxygens to form 
dp x bonds. The lobes on a 3d orbital are of the proper 
size to accomplish such an overlap: 
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All of these features are indicated by the six resonance 
structures of sulfate, one of which is 2-28. The double 
bonds in these resonance structures indicate the 
dp x overlaps, not pp zoverlaps as are indicated by 
the double bonds in resonance structures for molecules 
containing only elements of the second period. Because 
they are dp n overlaps, the octet rule can be violated in 
resonance structures involving sulfur. In all of the oxides 
of sulfur (Figure 2-8), between the tetrahedral sulfur 
and the various oxygens, there are the obonds and 
dp x bonds. 

A sulfenic acid, the first oxidation level of a thiol, 
would be the monothio analog of a peroxide just as a 
disulfide is the dithio analog of a peroxide. One of the 
tautomers of a sulfenic acid would be the hydrogen 
analog of a sulfoxide. A sulfoxide is the first oxidation 
level of a thioether. Sulfoxides are stable oxides of sulfur, 
but sulfenic acids have not been isolated because they 
are so unstable. They have been postulated to exist as 
intermediates in the cleavage of disulfides produced by 
hydroxide in the presence of catalytic amounts of metal 
ions: 


Me2+ 
2RSSR + 40H ——> 3RS + RSO; + 2H2O 


(2-40) 


It has been proposed” that the sulfenic acids that are 
intermediates in this reaction would disappear as the 
result of their immediate disproportionation, 


2 RSOH —— RSH + RSO;H (2-41) 
in a reaction homologous to but much more rapid than 
the disproportionation of peroxides. This reaction, how- 
ever, requires the collision of two sulfenic acids. A 
sulfenic acid at a cysteine in a molecule of a native pro- 
tein can be sterically prevented from such a collision. A 
cysteine that is buried in the structure of amidophos- 
phoribosyltransferase from E. coli is a mixture of its 
sulfenic acid and its sulfinic acid, each protected and sta- 
bilized in turn by the protein that surrounds it.°' In the 
absence of such protection, however, a cysteine in the 
form of a sulfenic acid would be both a strong reductant 
and a strong oxidant and rapidly susceptible to further 
oxidation and reduction. 

Sulfinic acids, the next level in the oxidation of 
thiols, are stable compounds that can be isolated. One 
of the tautomers of a sulfinic acid is the hydrogen analog 
of a sulfone. Sulfones, the next level in the oxidation of 
thioethers, are also stable oxides of sulfur. 

The sulfonate is the last oxidation state available to 
an alkylthiol. The sulfonate of cysteine is cysteic acid. 
Sulfonates are also quite stable. Methionine and cysteine 
are often purposely converted to methionine sulfone and 
cysteic acid to make them stable to further oxidation.” 

Oxidations such as those outlined in Figure 2-8 
often occur adventitiously and can introduce charge het- 
erogeneity into a protein or peptide owing to the forma- 
tion of cysteic acid. Such oxidations can also cause 
functional damage to a protein. It is the adventitious oxi- 
dation of a methionine in a,-antitrypsin caused by ciga- 
rette smoke that destroys the function of this protein and 
produces emphysema.” 

Phosphoserine, phosphothreonine, and phospho- 
tyrosine* 


phosphoserine 
2-30 


are formed by posttranslational modification. The phos- 
phate is attached to the side chain of the amino acid as a 
monoester of phosphoric acid. 

The model for the bonding in these phosphorylated 
amino acids is the trianion of phosphate, POT, The 
bonding in the phosphate trianion is similar to that in 


* Unfortunately, the prefix for the -PO,” functional group officially 
sanctioned by IUPAC for use by organic chemists is “phosphono”, 
but the prefix for the same functional group officially sanctioned by 
IUPAC for use by biochemists is “phospho”. Consequently, some 
confusion can arise. 


sulfate dianion with dp zz bonds, formed by the overlap of 
d orbitals on phosphorus and p orbitals on oxygen. These 
bonds are indicated by the four resonance structures for 
the phosphate trianion, indicating the equivalence of all 
of the bonds between phosphorus and oxygen. The 
olone pairs of electrons in the unperturbed trianion 
must be distributed around each oxygen in such a way 
that the tetrahedral symmetry of the entire anion is 
retained. This symmetry is, however, readily perturbed 
because the dp x bonds are polarized owing to the differ- 
ence in electronegativity between phosphorus and 
oxygen. For example, the donors of hydrogen bonds are 
oriented at randomly assumed angles around each of the 
three equivalent oxygens in the hydrogen phosphate 
dianion bound to phosphate-binding protein from E. coli 
as if there were no incontrovertible geometry for the lone 
pairs on each of them.™ 

The acid-base properties of inorganic phosphate 
(Table 2-1) and monoesters and diesters of phosphoric 
acid reflect this ability of the system of the dp a molecu- 
lar orbitals to spread negative charge over two or more 
oxygens because the acid dissociation constants (Table 
2-1) are much closer together than one might expect for 
a series of steps that each increase the negative charge 
number of a small acid-base by 1 unit. The acid dissoci- 
ation constants for an alkyl monoester of phosphoric 
acid (pK,ı = 1.7 and pK, = 6.7) 12 and for a dialkyl diester 
of phosphoric acid (pK, = 1.5)” are close to those of phos- 
phoric acid itself, but sugar phosphates, and presumably 
also serine phosphate and threonine phosphate, are 
more acidic (pk, = 0.9 and pK, = 6.1)” because of induc- 
tive electron withdrawal. 


Problem 2-17: 


(A) At pH 7.0, what fraction of the lysine in the pep- 
tide Gly-Pro-Lys-Ala-Thr would be in the neutral 
nucleophilic form? What fraction at pH 12? 


(B) The e-amino group of lysine in a polypeptide 
reacts readily with acetic anhydride. Write a 
mechanism for this reaction. 


(C) At pH 12, 10°C, and 0.1 M KCI, the lysine in the 
above pentapeptide would react with acetic anhy- 
dride at a rate of 1.3 x 10° M min (ky). Write a 
kinetic mechanism for this reaction at any pH that 
involves only this rate constant and the acid dis- 
sociation constant K,x of the lysine, and solve it 
for the initial velocity (v;) of the reaction between 
lysine and acetic anhydride. Assume that the acid 
dissociation equilibrium is rapid compared to ky. 


Problem 2-18: In the peptide CH;CO-Gly-Glu-Gly-His- 
NH, which acid-bases would be titrating in the region 
between pH 2 and 11? What are the approximate values 
of each pK}? Plot as a function of pH the fraction of each 
of the three major ionic forms of the peptide present in 
solution. 
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Problem 2-19: Two compounds (A and B) have been 
isolated from a protein by enzymatic hydrolysis. Both 
have the composition C;H,)N203. The titration behavior 
of the compounds is the following: 


compound A compound B 
pKa 3.85 pKa 2.15 
DEA 8.25 DEA 9.19 


After acid hydrolysis for 20h in 6M HCl, both com- 
pounds have the composition C;HgNO, and the follow- 
ing titration behavior: 


compound A’ compound B’ 

DE, 2.16 DE, 2.16 
PK 4.32 PK 4.32 
DEA 9.95 PK 9.95 


(A) What are compounds A and B? 


(B) Explain their behavior on titration. 


Problem 2-20: Draw a linkage relationship between the 
microscopic acid dissociation constants of glycine and 
its two tautomers in the form of Equation 2-20. The 
values of pK, for the two macroscopic acid dissociation 
constants of glycine are pK,ı = 2.34 and pK,» = 9.6. The 
macroscopic pK, for glycolic acid is 3.82 and that for 
acetic acid is 4.75. Estimate the equilibrium constant 
between the two tautomers of glycine and its four micro- 
scopic equilibrium constants. 
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Chapter 3 


Sequences of Polymers 


By direct chemical analysis of purified proteins, it has 
been shown that they are composed of linear polymers of 
amino acids, referred to as polypeptides. These polymers 
are formed by a ribosome that reads the messenger RNA 
and converts the sequence of codons into a sequence of 
amino acids coupled covalently together in the dictated 
order. Every polypeptide begins its existence as a single 
polymer of amino acids of a precise length coupled in a 
precise order. By and large, this polymer of amino acids 
is conserved in the mature protein. On its way to matu- 
rity, however, various alterations can occur. One class of 
such alterations is the one that includes changes to the 
sequence of the amino acids. Short segments of amino 
acids are often removed from the amino-terminal or car- 
boxy-terminal ends of the protein or cut out of the 
middle, leaving a broken chain. If such an alteration 
occurs, it causes the actual amino acid sequence of the 
polypeptide in a mature protein to differ from the 
sequence encoded in the messenger RNA. 

The sequence of the amino acids in a mature 
polypeptide can be determined directly, but this is rarely 
done anymore. It is far easier to sequence the messenger 
RNA for the protein and translate the sequence of 
nucleotides into a sequence of amino acids. Because an 
amino acid sequence determined today is almost always 
the one encoded by the messenger RNA, alterations in 
the amino acid sequence ofa protein that occur naturally 
during its maturation often escape detection initially. 
Eventually, however, most are detected, for example, as 
unexpected behavior of the protein upon electrophoresis 
or an incorrect mass on mass spectrometry, and then the 
sequence of the mature protein must be defined by 
direct analysis. This direct analysis always relies heavily 
on the knowledge of the amino acid sequence encoded 
by the messenger RNA because the lion’s share of the 
original amino acid sequence usually remains in the 
polypeptides forming the mature protein. 

As part of the process that produces a mature pro- 
tein, other changes are often made to the constituent 
polypeptides. These changes are either chemical modifi- 
cations of the amino acids themselves or the attachment 
of other compounds to the amino acids. For the most 
part, these posttranslational modifications are unpre- 
dictable, and each presents a challenge in analytical 
chemistry. There is a series of such modifications, how- 
ever, that consists of the addition of oligosaccharides to 
particular amino acids, and these modifications are 


defined by the sequences in which the sugars are linked 
together in these oligomers. 

With the exception of the unexpected posttransla- 
tional modifications, which are relatively infrequent, 
defining the covalent structure of a mature polypeptide 
is an exercise in the sequencing of polypeptides, nucleic 
acids, and oligosaccharides. 


Sequencing of Polypeptides 


Each naturally occurring polypeptide (2-15) has its own 
length and its own amino acid sequence. The amino acid 
sequence is the order in which the side chains of the 
amino acids (R; in 2-15) are arranged along the polymer. 
The continuous lengths of the polypeptides found in 
molecules of protein, and hence the lengths of their 
unique sequences, can be quite long. For example, 
human apolipoprotein B100 is 4560 amino acids (aa) 
long, ' human mucin MUC2 is 5159 aa long,” and human 
cardiac titin is 26,926 aa long.’ The amino acid sequence 
of a given polypeptide is written as a word, each of whose 
letters stands for an amino acid. The word begins at the 
amino terminus, ends at the carboxy terminus, and is 
usually spelled correctly. 

The amino acid sequence of a polypeptide deter- 
mines which protein it will become. Bovine pancreatic 
ribonuclease can be defined as the protein produced in 
the pancreas of a steer that can cleave ribonucleic acid at 
random along its length in a reaction that leaves the 
phosphate on the 2’- and 3’-positions of the products, or 
it can be defined as the folded polypeptide, 124 amino 
acids long, with the amino acid sequence KETAAAKFER- 
QHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFV- 
HESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDC- 
RETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV. 
That the sequence is sufficient to define ribonuclease has 
been demonstrated by total synthesis.’ A similar demon- 
stration was made for the peptidase from human 
immunodeficiency virus.” 

The complete amino acid sequences of polypeptides 
were, in the past, determined directly. The amino acids in 
a polypeptide can be removed in single steps from the 
amino-terminal end by the Edman degradation’ (Figure 
3-1).* The strategy ofthe Edman degradation relies upon 


*From here on only those lone pairs involved in each step of a 
chemical mechanism will be drawn. 
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Figure 3-1: Steps in the mechanism of the Edman degradation.® Phenyl isothiocyanate is 
used under basic conditions to produce an N-phenyl-N’-peptidylthiourea at the amino ter- 
minus. The nucleophilic sulfur of the thiourea then can attack intramolecularly the acyl 
carbon of the first peptide bond, but only under conditions of strong general acid catalysis, 
which promote protonation of the acyl oxygen. Anhydrous trifluoroacetic acid (TFA) is used 
to prevent any unwanted hydrolytic side reactions at this step. The shortened polypeptide 


that leaves during this second step is unreactive at its amino terminus under these condi- -N 
tions owing to protonation. The anilinothiazolinone and the shortened polypeptide are then Os "H 
separated from each other. The shortened polypeptide is recycled through coupling and © 


cleavage. The anilinothiazolinone is opened and recyclized in aqueous acid to produce the 


phenylthiohydantoin of the first amino acid. phenylthiohydantoin 


the separation of the chemistry into two discrete, con- 
trolled steps (labeled CD and © in Figure 3-1) that permit 
the removal of one amino acid at a time from the polypep- 
tide as the phenylthiohydantoin. The phenylthiohydan- 
toins from each step can be positively identified on 
chromatography by adsorption.’ 

Only in fortuitous circumstances, however, can the 
Edman degradation be run for more than 20 or 30 cycles. 
The necessity for two steps in each cycle as well as the 
step separating the shortened polypeptide from the thi- 
azolinone, none of which can be performed in 100% 
yield, causes the cumulative yield of phenylthiohydan- 
toin to decrease inexorably and noise to increase apace. 
Side reactions such as random hydrolysis of the poly- 
peptide and cyclization of amino-terminal glutamines to 


pyrrolidones? also increase noise and lower yield, respec- 
tively. Methods for sequencing a polypeptide from its 
carboxy terminus’ and alternative methods for sequenc- 
ing one from its amino terminus” have been described. 
So far, the former have been far less reliable than the 
Edman degradation and the latter have been supplanted 
by automated machines the chemistry of which relies on 
the Edman degradation.’ These machines are able to 
provide a sequence from tens of picomoles of a peptide, 
but they have not overcome the inherent shortcomings 
of the chemistry. 

In its present applications, the automated Edman 
degradation is performed on peptides or polypeptides 
noncovalently'' attached to thin membranes of glass 
fiber’ or poly(vinylidene difluoride).'” Because the pep- 


tide remains bound to a solid phase, the reagents, in 
solution or as gases, can be sequentially applied to and 
removed from the peptide efficiently. It is also possible to 
transfer polypeptides that have been separated by elec- 
trophoresis onto these supports and then submit them to 
sequencing.” 

Because polypeptides cannot be sequenced in their 
entirety by the Edman degradation, they are cleaved into 
pieces, or peptides, that can be. This cleavage can be per- 
formed with endopeptidases that hydrolyze the peptide 
bonds at the locations of specific amino acid residues in 
the sequence (Figure 3-2). All of these enzymes, 
except the papain from Zingiber officinale, have been 
used to digest long polypeptides specifically during elu- 
cidations of their complete amino acid sequences. 
Because these enzymes cleave only peptide bonds adja- 
cent to specific amino acids, high yields of a reasonable 
number of peptides, each with a specific sequence, can 
be obtained from a long polypeptide. 

If polypeptides are to be cleaved by endopeptidases, 
they must be unfolded or denatured. A folded, compact 
molecule of protein is usually resistant to digestion by 
endopeptidases for steric reasons. Although the most 
common way to denature a protein to prepare it for diges- 
tion is to precipitate it irreversibly at high temperature, 
this approach can fail. If it does, denaturing the protein 
that is to be cleaved without simultaneously denaturing 
the endopeptidase, which is itself a protein, requires 
some strategy. Usually the chemical modification of one 
type of amino acid in the polypeptide while it is unfolded 
in a solution of a salting-in solute such as urea is suffi- 
cient to prevent it from refolding after the denaturant is 
removed. The carboxymethylation of cysteines with 
2-iodoacetate, after the cystine side chains in the protein 
have been reduced,” and the maleylation of lysines!” 
are examples of this strategy. When proteins that are nor- 
mally embedded in biological membranes are removed 
from the membrane, their polypeptides often remain sol- 
uble and unfolded and can be cleaved with endopepti- 
dases. Some endopeptidases are themselves quite 
stable and will function in solutions of denaturants suffi- 
cient to unfold the protein to be cleaved. 

At times it is useful to cleave a polypeptide at only 
one or two specific locations in its sequence so that long 
fragments can be isolated from it. The most common 
way that this is done is to take advantage of the resistance 
of the native, properly folded protein to digestion by 
endopeptidases. The consequence of this resistance is 
that when a properly folded protein is treated with an 
endopeptidase such as trypsin or chymotrypsin, often 
only one or two of its peptide bonds are exclusively 
hydrolyzed, and this hydrolysis produces the long frag- 
ments desired. Because this is completely the result of 
steric effects, no control over the location of the sites of 
cleavage, other than that exerted by the intrinsic speci- 
ficity of the endopeptidase, can be exercised. 

Polypeptides can also be cleaved chemically. The 
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paradigm of chemical cleavages is that produced to car- 
boxy-terminal sides of methionines by cyanogen bro- 
mide (Figure 3-3).°° Several other chemical cleavages of 
more limited usefulness have been developed. 2-Nitro- 
5-thiocyanatobenzoate induces cleavage on the amino- 
terminal side of cysteine residues (Figure 3-4), but the 
yield is less than quantitative and the amino terminus of 
the carboxy-terminal product is blocked.*” Cleavage at 
tryptophan residues can be performed chemically with 
brominating agents under heterolytic conditions" 
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This reaction proceeds through a bromonium cation that 
results from insertion of Br* into the olefin between car- 
bons 2 and 3 of the indole to create an electrophilic 
center. A nucleophilic attack of the acyl oxygen five 
atoms away then occurs as in the cleavage with cyanogen 
bromide. The resulting iminolactone hydrolyzes as it 
does in the cleavage with cyanogen bromide to release a 
fragment with a free amino terminus from the carboxy- 
terminal side of the tryptophan. The olefin between car- 
bons 2 and 3 in indole is an easily brominated position, 
and the mildest brominating agent capable of reacting at 
this location should be used under the mildest condi- 
tions to avoid widespread bromination of the polypep- 
tide elsewhere.” 

A chemical cleavage that can produce large frag- 
ments from a polypeptide is the cleavage that occurs 
preferentially at the peptide bond between an aspartate 
and a proline under mildly acidic conditions (Figure 
3-5).” This cleavage with acid results from intramolecu- 
lar attack of the carboxylate anion of the 3-carboxy group 
of the aspartate on its own acyl carbon, the acyl oxygen 
of which has been protonated, to produce, upon depar- 
ture of the amide nitrogen of the proline, an anhydride, 
which is subsequently hydrolyzed.** The cleavage occurs 
preferentially at proline because the amine in the initial 
tetravalent intermediate is by far the poorer leaving 
group, but proline, because it is a hindered secondary 
amine, is the best leaving group of all the amino acids. 
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Figure 3-2: Specific cleavage of a polypeptide with endopeptidases. Pancreatic trypsin hydrolyzes the peptide bonds on the carboxy-ter- 
minal sides of lysine and arginine residues with high specificity to produce a series of peptides. Each of these peptides has the respective 
lysine or arginine at its carboxy terminus." ° The lysine side chains can be rendered incapable of being recognized by trypsin by modification 
with succinic anhydride, maleic anhydride,” or citraconic anhydride.'® The latter two modifications are reversible, and the lysines can be 
regenerated, after cleavage with trypsin, to yield a series of unmodified peptides the carboxy-terminal residues of which are the respective 
arginines. Glutamyl endopeptidase (Glu-C) from the bacterium Staphylococcus aureus, strain V8, hydrolyzes polypeptides with high speci- 
ficity at the peptide bonds on the carboxy-terminal sides of glutamate residues.'® Under the proper conditions, the same enzyme also can be 
made to hydrolyze the bonds on the carboxy-terminal side of aspartate residues. Thermolysin, an endopeptidase from the bacterium Bacillus 
thermoproteolyticus, hydrolyzes polypeptides at peptide bonds on the amino-terminal sides of leucine, isoleucine, valine, phenylalanine, 
methionine, and occasionally alanine and tyrosine.” Pancreatic chymotrypsin usually catalyzes the hydrolysis of the amide bonds on the 
carboxy- terminal sides of phenylalanine, tyrosine, and tryptophan.”’ Lysyl endopeptidase (Lys-C) from either of the bacteria Achromobacter 
biens" or Lysobacter enzymogenes” hydrolyzes polypeptides with high specificity at the peptide bonds on the carboxy-terminal sides of 
lysines. Arginyl endopeptidase (Arg-C) from murine submaxillary gland hydrolyzes polypeptides at the peptide bonds on the carboxy-ter- 
minal sides of arginines.”* Peptidyl-Asp metalloendopeptidase (Asp-N) from the bacterium Pseudomonas fragi hydrolyzes polypeptides at 
the peptide bonds on amino-terminal sides of aspartate residues” and, occasionally, glutamate residues. Papain from Zingiber officinale 
hydrolyzes peptides at the next peptide bond beyond the one to the carboxy-terminal side of proline residues with little preference for the 
amino acids immediately adjacent to the peptide bond that is cleaved.” 
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Figure 3-3: Mechanism of cyanogen bromide cleavage of a polypeptide on the carboxy-terminal side of a methionine. At acidic pH, a 
methionine side chain, because it is not protonated, remains nucleophilic enough to react in an acyl exchange reaction with cyanogen bro- 
mide to produce a cyanosulfonium cation. This cationic center causes the carbon of the adjacent methylene to be electrophilic. This elec- 
trophile is five atoms away from the weakly nucleophilic acyl oxygen of the same amino acid, and an intramolecular, nucleophilic 
substitution ensues. The conjugate acid of the iminolactone formed in this nucleophilic substitution is susceptible to hydrolysis under the 
acidic conditions. This hydrolysis produces a mixture of the lactone and the open y-hydroxycarboxylic acid of homoserine at the carboxy ter- 
minus of the resulting peptide. 
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Figure 3-4: Cleavage of a polypeptide to the carboxy-terminal side of cysteine by cyanylation with 2-nitro-5-thiocyanatobenzoate. 
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Figure 3-5: Cleavage of a polypeptide at the peptide bond between an aspartate and a proline under acidic conditions. 


The treatment with acid can be prolonged intentionally 
to produce cleavage to the carboxy-terminal side of 
many more of the aspartates in the polypeptide.***® 
Preferential chemical cleavage between an asparagine 
and a glycine can be produced with hydroxylamine at 
alkaline pH and elevated temperature.” Both the 
cleavage between aspartate and proline and the cleavage 
between asparagine and glycine produce large fragments 
of the polypeptide because the frequency at which 
aspartylprolyl and asparaginylglycyl positions occur 
within the amino acid sequence of a protein is low. 

Each of these enzymatic or chemical cleavages pro- 
duces a particular set of peptides from a given polypep- 
tide, and the complex mixtures that result must be 
separated by chromatography. Chromatography by 
molecular exclusion can be used to separate the mixture 
into groups of peptides of different lengths (Figure 3-6) .°° 
The larger peptides from this first step can be further sep- 
arated on chromatography by ion exchange with matri- 
ces of cellulose or dextran. Because these larger 
peptides often aggregate or precipitate, these columns 
are generally run in solutions of trifluoroacetic acid! or 
formic acid” or denaturants such as urea. At high or low 
pH, the net charges on all of the peptides are negative or 
positive, respectively, and aggregation is discouraged by 
mutual electrostatic repulsion. Large peptides can also 
be made more soluble by modification of all the lysine 
side chains with citraconic anhydride to increase their 
net negative charge at neutral and basic pH.” 

The smaller peptides, either those isolated first on 
chromatography by molecular exclusion or those in the 
whole digest, can be separated on chromatography by 
cation exchange with sulfonated polystyrene’ or high- 
pressure liquid chromatography by reverse-phase 
adsorption under acidic conditions on alkylated silica gel 
(Figure 3-7).*°"' The latter method can also be used to sep- 
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Figure 3-6: Separation of peptides produced by cleavage of S-car- 
boxymethylated human phosphoglycerate kinase with cyanogen 
bromide. The protein (50 mg) was dissolved in 70% formic acid 
and solid cyanogen bromide was added to a final concentration of 
20 mg ml" After 24 h, the solution was frozen and the water and 
cyanogen bromide were removed by sublimation. The cyanogen 
bromide fragments (50 mg) were applied to a column (1.9cm x 
150 cm) of Sephadex G-75 run in 0.2 M ammonium bicarbonate. 
The fractions of the effluent were monitored by absorbance at 
230 nm (@) and 280 nm (O). Pools (I-X) were made as indicated. 
The numbers indicate which fragments, identified later in other 
separations, were in each pool. Reprinted with permission from ref 
39. Copyright 1980 Journal of Biological Chemistry. 


arate large peptides such as cyanogen bromide frag- 
ments.’ The resolution obtained with either chroma- 
tography by cation exchange or high-pressure 
chromatography by adsorption are similar, but the latter 
has become the method of choice because of its rapidity, 
the continuous spectrophotometric monitoring it permits, 
and its adaptability to samples containing small quanti- 
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Figure 3-7: Separation of peptides from cytochrome c peroxidase 
on chromatography by adsorption.” The hemoprotein cytochrome 
c peroxidase from Paracoccus denitrificans was dissolved in 8 M 
urea containing HgCl,, and after 20 h, the heme was separated 
from the protein by molecular exclusion chromatography per- 
formed in 5% formic acid. The solvent was evaporated and the 
resulting solid protein (30 nmol) was suspended in 0.1 M ammo- 
nium hydrogen carbonate. Lysyl endopeptidase from L. enzymo- 
genes (30 ug) was added to the suspension, and after 4 h at 37°C, 
the solution had clarified. The sample was evaporated to dryness 
and redissolved in a dilute solution of trifluoroacetic acid. The pep- 
tides were injected onto a column (0.46 x 25 cm) of a reverse-phase 
chromatographic medium of octadecylated silica equilibrated with 
0.1% trifluoroacetic acid. The peptides were eluted with a linear 
gradient between 0.1% trifluoroacetic acid and 70% acetonitrile, 
0.1% trifluoroacetic acid (solvent B).“! Peptides were detected by 
their absorbance at 220nm. Peaks were pooled as indicated. 
Reprinted with permission from ref 36. Copyright 1997 American 
Chemical Society. 


ties of peptide. In all cases, the art of the chromatography 
lies in choosing solvents and buffers that will dissolve the 
peptides, meet the demands of the chromatographic 
process chosen for the separation, and be easily removed 
from the peptides after they have been separated. 

Once the peptides have been purified, their amino 
acid composition can be determined by hydrolysis,*° 
performed under vacuum in 6 M HCl, followed by 
quantitative cation-exchange chromatography with sul- 
fonated polystyrene (Figure 1-3). In this way, if the 
peptide is pure and not too long, the amount of each 
amino acid it contains can be determined. Usually, how- 
ever, the peptides are sequenced directly because proce- 
dures for sequencing by automated Edman degradation’ 
have become more sensitive than procedures for amino 
acid analysis. 

Exopeptidases, such as carboxypeptidase A,“ 
carboxypeptidase B,“ serine-type carboxypeptidase,® or 
leucyl aminopeptidase,” can be used to assist in deter- 
mining or confirming the sequence of a peptide. These 
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enzymes remove amino acids one at a time from one or 
the other of the ends of the peptide. Because the short- 
ened peptide released as a product by one of these 
enzymes immediately becomes a reactant for the next 
cleavage, these digestions do not release the amino acids 
from the respective end in a stepwise fashion as does the 
Edman degradation but by a progressive process,“ and 
absolute information about sequence beyond three or 
four residues from the end is rarely obtained with one of 
these enzymes alone. 

A strategy similar to those just described has been 
developed for the mass spectrometric analysis of mix- 
tures of peptides produced by digesting a protein.” 

Amass spectrometer is an instrument that separates 
a population of ionic molecules in the gas phase in the 
order of their mass to charge number ratio (m/z). The ionic 
molecules, after they have been separated by the mass 
spectrometer, can be registered individually by a detector 
to produce a mass spectrum (Figure 3-8),“ which records 
the amount of each ion in the sample as a function of its 
mass to charge number ratio. A mass spectrometer can 
also be used to select only ions of a particular mass to 
charge number ratio, which can then be directed into 
another instrument. Quadrupole mass spectrometers 
and ion-trap mass spectrometers separate the ionic mol- 
ecules by passing them through specifically designed, 
oscillating electric fields. Time-of-flight (TOF) mass spec- 
trometers accelerate the entire population of ionic mole- 
cules in a uniform electric field and then pass them 
through a vacuum chamber. Because e,E,x = % mz ‘v?and 
the electric field (E,) accelerates all of the ionic molecules 
over the same distance (x), the time it takes each of them 
to arrive at the end of the chamber is proportional to the 
square root of its mass to charge ratio. 

There are currently three ways to transfer a biolog- 
ical molecule such as a peptide, an oligosaccharide, a 
nucleic acid, or a molecule of protein from the aqueous 
solution in which it is normally found to the gas phase in 
the form of a monodisperse vapor of individual, ionized 
molecules. 

The first method is to pass a dilute aqueous solution 
containing the macromolecule through an electrospray 
atomizer,’ which produces a mist or electrospray so fine 
that each macromolecule finds itself in its own droplet. 
The solvent in the droplet evaporates and leaves the 
intact macromolecule in the gas phase bearing one or 
more of the elementary positive charges or negative 
charges that were generated on the surface of the droplet 
by the atomizer. For proteins and oligosaccharides, the 
atomizer is usually polarized to produce positive ions, 
while for nucleic acids, which are already negatively 
charged in solution, it is polarized to produce negative 
ions. The elementary charges generated by the atomizer 
are the result of an excess or a deficit of protons, just as 
charge is produced on a macromolecule in solution 
(Figure 1-11). Electrospray produces a family of ions 
from each macromolecule, each one of the ions differing 
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Figure 3-8: Mass spectrometry of a tryptic peptide from thioredoxin.” Thioredoxin from Chromatium vinosum was dissolved in 6 M guani- 
dinium chloride and 0.1 M tris(hydroxymethyl)aminomethane, pH 8.5. The cystines in the protein were reduced with dithiothreitol, and the 
resulting cysteines were aklylated with iodoacetamide. The product was separated from the small molecules by molecular exclusion chro- 
matography and evaporated to dryness, and a portion (50 nmol) of the dry powder was suspended in 0.1 M ammonium hydrogen carbonate 
and 0.1 mM CaCl). Bovine pancreatic trypsin (12 ug) was added to the suspension and the digestion proceeded for 2 h at 37°C. The peptides 
produced by the digestion were collected by evaporating the solvent, and they were redissolved in a dilute solution of trifluoroacetic acid and 
injected into a column of octadecylated silica equilibrated with 0.05% trifluoroacetic acid. They were eluted with a linear gradient from 0% 
to 50% acetonitrile in 0.05% trifluoroacetic acid. The peptides in one of the 10 pools of peaks from this chromatographic step were vaporized 
by fast-atom bombardment from a matrix of glycerol and passed into a tandem mass spectrometer. The beam of monocationic (M + H*) pep- 
tide ions of mass 1208.2 Da was selected, fragmented by a beam of helium atoms of high kinetic energy, and passed into the second mass 
spectrometer. The abundances of the various fragments produced are displayed as a function of their mass. The fragment patterns are labeled 
as in Equation 3-2, and the amino acids, identified by the distances in mass units between each of the steps, are indicated above the respec- 
tive steps. Fragments are produced by cleavage at successive points from each end of the peptide. Reprinted with permission from ref 49. 
Copyright 1987 American Chemical Society. 


from the others in the number of elementary charges that assisted-laser-desorption ionization, MALDI).” The 
it bears. For example, ions of cytochrome c (naa = 104) heat evolved from the absorption of the light by the 
carrying between 11 and 21 elementary positive charges matrix produces an explosive vaporization of its top 
were generated by such an atomizer.” layer, ejecting the macromolecules into the gas phase 
The other two methods rely on the initial monodis- mostly as monoactions” and presumably neutral mole- 
persion of the individual macromolecule into a solid glass cules and monoanions as well. 
or liquid of low volatility referred to as a matrix. The Electrospray and fast-atom bombardment are both 
matrix is formed from a small molecule such as nicotinic continuous processes that produce a continuous flux of 
acid, a solid, or glycerol, a liquid. An aqueous solution of ionic molecules. This continuous flux can be directed 
the macromolecule, at a low molar concentration, and the into a quadrupole mass spectrometer to produce contin- 
molecule that will form the matrix itself, at a high molar uous streams of separated ionic molecules. Matrix- 
concentration, is applied to a solid surface, and the water assisted-laser-desorption ionization, however, is 
is evaporated to produce the dilutely occupied matrix. accomplished with short pulses of the laser (<10 ns) to 
There are two ways to shatter the matrix and in the avoid overheating the sample. As a result, the source 
process eject the macromolecules within it into the gas emits short pulses of ionic molecules, and each pulse 
phase. A beam of neutral argon atoms of high kinetic contains only a few of each of the individual ionized mol- 
energy can be directed onto the matrix (fast-atom bom- ecules. In such a situation, a time-of-flight mass spec- 
bardment, FAB), and the explosive collisions of these trometer, which requires a pulsed source, is usually used 
atoms with the matrix vaporize the macromolecules as a to separate the gaseous ionic molecules. 
mixture of mainly neutral intact molecules, monoproto- Each of these three procedures has its advantages 
nated neutral molecules (monocations), and singly and, unfortunately, its disadvantages. To its detriment, 
unprotonated neutral molecules (monoanions) dis- electrospray produces a mass spectrum in which each 
persed in the gas phase®”” Alternatively, a neodinium- macromolecule is represented by an envelope of many 
Yag laser emitting light of wavelength 266 nm, which is individual peaks. For example, the envelope for 
absorbed by the molecules of the matrix, for example, a-amylase contained more than 30 peaks, each repre- 


nicotinic acid, can be directed onto the sample (matrix- senting a molecule of -amylase with a particular number 


(30-60) of elementary positive charges.’ These large 
numbers of peaks generated from each molecule com- 
plicate the analysis of mixtures of molecules such as the 
mixtures of peptides obtained from endopeptidolytic 
digestion of a protein. Electrospray, however, is the 
mildest method for producing a high yield of a vapor of 
ionized molecules, and large molecules of protein can be 
vaporized. Matrix-assisted-laser-desorption ionization 
and fast atom bombardment both produce mainly mono- 
cations or monoanions of a molecule, thereby providing 
only one unambiguous molecular ion for each molecule. 
The former technique is able to vaporize significantly 
larger molecules (up to 200,000 Da) than the latter (up to 
20,000 Da),” but the former has the disadvantage that the 
yield of ions for each pulse is low. Nevertheless, mass 
spectrometry has become a routine procedure, and in the 
sequencing of peptides it is rapidly supplanting chemical 
sequencing based on the Edman degradation. 

When mass spectrometry is applied to dissecting 
proteins and sequencing the resulting peptides,” the 
polypeptide of the protein is first digested with an 
endopeptidase. Usually the digest is then separated with 
one chromatographic step (Figure 3-7). Each pool of a 
peak from the chromatogram is subjected to vaporiza- 
tion, and the flux of cations produced is directed into a 
tandem mass spectrometer. In addition to being able to 
register the mass of each peptide in the pool, the first 
mass spectrometer of the tandem can choose the stream 
of only one of the ionic molecules and hence only one of 
the monocationic peptides. This beam of purified pep- 
tide cations is passed through an orthogonal beam of 
helium atoms of high kinetic energy that cleave the mol- 
ecules of peptide by collision-induced dissociation (CID) 
into characteristic fragments: 
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These fragment ions are then passed into the second 
mass spectrometer of the tandem, which can be either a 
quadrupole mass spectrometer or a time-of-flight mass 
spectrometer. The resulting pattern of masses that is 
observed (Figure 3-8) is a set of four separate arrays 
(aizan, bı-b„, Yı-Y„ and x,-x,), one from each type of 
fragmentation (Equation 3-2). The number of mass units 
between each step in each of these arrays provides the 
sequence of the peptide. In this procedure, the first mass 
spectrometer performs the separation of the peptides in 
each chromatographic pool that would normally be per- 
formed by subsequent steps of chromatography, and the 
second mass spectrometer performs the sequencing that 
would normally be performed by automated Edman 
degradation. 

If only the identity of the polypeptide is desired, not 
its complete sequence, it is possible to slice a band con- 
taining that polypeptide from a polyacrylamide gel, digest 
it with trypsin, and introduce the entire digest into a 
tandem mass spectrometer without performing the ini- 
tial chromatography. Peptide ions that are well resolved 
by the first mass spectrometer can be selected for frag- 
mentation, and the pattern ofthe fragments obtained pro- 
vides the amino acid sequence of those peptides.” In this 
way, a protein appearing on an electrophoretogram can 
be positively identified from the amino acid sequences of 
many of its constituent peptides. 

The grand strategy for determining the complete 
sequence of a polypeptide directly is to separate and 
sequence all of the peptides from one particular cleav- 
age, to cleave the protein at a set of different locations, to 
identify all of the peptides in this second set that contain 
the points of cleavage for the first set, and to sequence 
these overlapping peptides to learn the order in which 
the first peptides are arranged in the intact polypeptide. 
The dramatic epics,” in each of which this strategy was 
applied to another protein and its sequence was 
revealed, are now seldom produced.” The expectation 
and excitement surrounding each of them is only dimly 
remembered. In their place are myriads of short essays 
that present the sequences of often long polypeptides. 
This flood of information has been possible because the 
sequences of polypeptides are now determined by 
sequencing DNA complementary to the messenger RNA 
that encodes them. 
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Problem 3-1: Write a complete mechanism for the fol- 
lowing chemical reaction.” Draw in important lone pairs 
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and indicate the combination of nucleophiles and elec- 
trophiles with arrows. Use protons where appropriate. 
For what purpose is this chemical reaction used? Write 
the step-by-step cycle for using this reaction to accom- 
plish this purpose. 
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Problem 3-2: Write a complete mechanism for this 
reaction: 


HCl 
N-( a@-aspartyl)phenylalanine —— 
een 116°C 


aspartic acid + phenylalanine 


Problem 3-3: A cyanogen bromide fragment has been 
purified from a digest of certain protein. Consider the 
following information. The compositions shown in 
parentheses are those obtained following complete acid 
hydrolysis in 6 M HCl, 110 °C, for 24 h. 
(A) Complete acid hydrolysis 
(1) (E, F, 2 G, homoserine (Hse), K, L, R, S, V) 


(B) Amino terminus 
(2) (V) 
(C) Amino acid composition of peptides from tryptic 
digest 
(3) (E, G, R, V) 
(4) (F, G,K, S) 
(5) (Hse, L) 


(D) Edman degradation 


cycle 
peptide 1 2 
(3) V E 
(4) F S 


(E) Reaction with 2,3-butanedione, followed by tryp- 
tic digest 
(6) (E, F, 2 G, K, R, S, V) 
(7) (Hse, L) 


What is the sequence of the fragment? With which 
amino acid side chain does 2,3-butanedione react? 


Problem 3-4: Deduce the sequence of an unknown pep- 
tide from the following information. 


(A) Amino acid composition of intact peptide 
(A, 2 E, G, L, K, R, 2 S, T) 

(B) Tryptic peptides 
(1) ED 
(2) (G, K, S) 
(3) (A, E, L, R, S) 

(C) Trypsin followed by one cycle of Edman degrada- 
tion yields the phenylthiohydantoins of S, A, and 
T. 

(D) Peptides produced by digestion with thermolysin 
(4) (A, G, K, 2S) 
(5) (2E, L, R, T) 


(E) At pH 8.0, tryptic peptide 3 moved on elec- 
trophoresis with a positive charge 


Problem 3-5: Deduce the sequence of a peptide from 
the following information. 
(A) Tryptic peptides 
(1) (A, E, F) 
(2) (Q, S, R, V) 
(3) (A, K, V) 
(B) Carboxypeptidase A 
(4) A then F and E 


(C) Modification with methyl acetimidate followed by 
trypsin 
(5) (H, K, Q, R, S, 2 V) 
(6) (A, E, F) 


(D) Amino-terminal amino acids 
peptide (1), F 
peptide (2), V 
peptide (3), H 
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(E) Edman degradation 
cycle 
peptide 1 2 3 
(2) V S Q 


Problem 3-6: What are the expected masses of the 28 
cations produced by fragmentation of the protonated 
gaseous cation (M + H’) of the peptide GGEVEATK?” 


Cloning, Sequencing, Expressing, and Mutating 
of Deoxyribonucleic Acids 


Nucleic acids are linear polymers (see 3-1 below) the 
monomers of which are nucleoside 5’-mono- 
phosphates. The covalent bonds that link the monomers 
together to form the polymer are the oxygen-phospho- 
rus bonds that connect the 3’-hydroxyl group of one 
nucleoside and the 5’-phosphoryl group of the next. Each 
of these bonds produces a diester of the respective phos- 
phoryl group (a phosphodiester linkage). Nucleic acids 
are divided structurally and biologically into ribonucleic 
acids (RNA), which have a 2’-hydroxyl group on each of 
their furanosyl rings as in 2-9 to 2-12, and deoxyribonu- 
cleic acids (DNA), which are unsubstituted at the 2’-posi- 
tion of their furanosyl rings as in 3-1. Aside from this 
distinction, every nucleic acid has the same polymer 
backbone. One end of a molecule of single-stranded 
nucleic acid is a phosphorylated 5’-hydroxyl group 
(5’-phosphate); the other end is a 3’-hydroxyl group. The 
5’-phosphate and 3’-hydroxyl group are the 5’-end and 
3’-end, respectively, of the polymer. At the pH usually 
encountered in living organisms (pH 7-8), the oxygens 
on each of the phosphoryl diesters in the backbone of a 
nucleic acid are unprotonated and each monomer bears 
a full elementary negative charge except for the 
monomer at the 5’-end, the phosphate of which bears an 
average of between 1.5 and 2 elementary negative 
charges, depending on the exact pH. 

There are four nucleoside 5’-monophosphates 
incorporated into a particular nucleic acid as it is syn- 
thesized biologically. These four nucleoside 5’-mono- 
phosphates are distinguished by the heterocyclic bases 
they contain (R; in 3-1). Cytosine (C) is the base in the 


95 


guanine (G) is the base in the nucleosides guanosine 
(2-11) and 2’-deoxyguanosine, and adenine (A) is the 
base in the nucleosides adenosine (2-12) and 
2’-deoxyadenosine. The ribonucleoside 5’-monophos- 
phates are incorporated into RNA, and the 2’-deoxyri- 
bonucleoside 5’-monophosphates are incorporated into 
DNA. Uracil (U) is incorporated into RNA on the 
5’-monophosphate of the nucleoside uridine (2-9). 
Uridine, however, is converted by dehydroxylation and 
methylation into thymidine, the 2’-deoxyribonucleo- 
side of 5-methyluracil, before its 5°-monophosphate is 
incorporated into DNA. The base 5-methyluracil is 
called thymine (T). 

Within each nucleoside, the base is attached to its 
respective ribose or 2’-deoxyribose in an azaacetal 
(N-glycosidic) linkage (see 2-9 to 2-12) between a 
pyridine nitrogen or an imidazole nitrogen of the 
pyrimidine or purine, respectively, and the aldehydic 
carbon at position 1 in the furanose ring. In the unpoly- 
merized nucleoside phosphates, a monophosphate, 
diphosphate, or triphosphate group is found on the 
5’-carbon. A nucleoside 5’-monophosphate, 5’-diphos- 
phate, or 5’-triphosphate is referred to as a nucleotide. 

Each nucleic acid has its own length and its own 
sequence in which the nucleotide bases, R; in 3-1, are 
arranged. The sequence of a nucleic acid is written as a 
word, each of whose letters stands for the base of the 
respective nucleotide. Unless otherwise noted, the word 
begins at the 5’-end and ends at the 3’-end. 

Deoxyribonucleic acid usually and ribonucleic acid 
often occur as double helices. In a double helix, two mol- 
ecules of nucleic acid, running in opposite directions, are 
wrapped around each other (Figure 3-9). The bases in 
the core of the double helix are paired, adenine next to 
thymine and guanine next to cytosine. Because the posi- 
tions in the sequence of the one strand of DNA occupied 
by deoxyadenosine, deoxyguanosine, thymidine, and 
deoxycytidine are paired with positions in the sequence 
of the other strand of DNA occupied by thymidine, 
deoxycytidine, deoxyadenosine, and deoxyguanosine, 
respectively, the sequence of one strand read 5’ — 3’ 
complements the sequence of the other strand read 
3’— 5’ (Figure 3-10). For example, the sequence 
-AGCAGA- complements the sequence -TCTGCT-. 

A polypeptide can be cleaved at specific sites with a 
particular endopeptidase (Figure 3-2), and DNA can be 
cleaved at specific sites with site-specific deoxyribonu- 


nucleosides cytidine (2-10) and 2’-deoxycytidine, cleases (restriction enzymes). Just as trypsin or ther- 
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molysin catalyzes hydrolysis of amide bonds in a protein 
only next to particular amino acids to produce specific 
peptides, site-specific deoxyribonucleases catalyze the 
hydrolysis of double-helical DNA only at phosphate 
diesters within particular sequences of nucleotides (Figure 
3-10). The particular sequences of nucleotides and the 
associated points of cleavage are known as restriction 


"UONEUW 
Su 


Areyuawafdu1o9 


aauanbas 


'spuoq uasoipAy Juasaıdaı saul] poysep L 
-JPS OU} WIM VNC pepuens-a[suls Jo JUSUIAS V yg 


‘TEASA OY] ur suonIsod Dovu auınsse Jey} 197VM Jo sansa 


-JOU JO sU9ZAXO ƏY} ale SIPP ƏNYM payoeyeun ay, Jq 
DODOLLVVDODO 


‘uoqie9 Jo swo pue ‘Avis yep ‘snioydsoyd Jo suroye ‘Avis 
WRT] ‘UasorjU Jo swop "2310 are UAasAXO Jo SWIOJY "Indy 


ay} ur pa}Uasaid et [apOUl ay} JO MƏM OaIa}s Y ‘pa}eiauas 
sem [opouı IefNdajour IrydeidogfeIsä1s e pue paZztT{e}sAIO 


ƏIƏM SIOUIIP TEIITOY-aJqNOP AL IIO (pa 01 [opferednue 
SUIUUNI spuens [eyHUaPpI OM} Zurure}uo9 WN Jo s}UeuIsas 
fesrfoy-aJqnop peurioj pue dn paired saynoajour Tenptarpur 
ay} ‘uonmjos ur paajossIp UIM ‘pozisomuAs Afearwoyd 


-10JU09 g prepues au ur YNA Testoy-aJqnoq :6-€ am 


SEM 


sites, and the fragments of DNA produced by these cleav- 
ages are known as restriction fragments. Unlike the situ- 
ation in dissecting proteins, a much larger number of 
site-specific deoxyribonucleases“ are available, the speci- 
ficities of which vary in their complexity. The particular 
sequence recognized by a given site-specific deoxyri- 
bonuclease can be anywhere from four to 12 nucleotides 
long. The longer the sequence recognized, the less fre- 
quently will it occur in the DNA, and the longer will be the 
restriction fragments produced. By using the appropriate 
site-specific deoxyribonuclease and carrying the digestion 
to the appropriate degree of completion, restriction frag- 
ments can be obtained of a desired size range containing 
within their population the complete sequence ofthe orig- 
inal DNA, just as a digest of a protein contains within its 
population of peptides the complete sequence of the pro- 
tein. 

Site-specific deoxyribonucleases produce restric- 
tion fragments with blunt ends or sticky ends. If the par- 
ticular enzyme used cleaves phosphodiester linkages in 
the two strands that are directly opposite each other, the 
two new ends it produces will both be completely double- 
stranded and blunt. If the particular enzyme used cleaves 
phosphodiester linkages on the two strands that are 
offset relative to each other (Figure 3-10), a short segment 
of single-stranded DNA will protrude from each of the 
new ends. Because they were before the cleavage, these 
two segments will necessarily be complementary to each 
other in sequence, will adhere to each other when they 
come in contact, and, consequently, are sticky. 

In addition to site-specific deoxyribonucleases, 
there are several other enzymes that are used to manip- 
ulate DNA (Figure 3-10). The phosphate on the 5’-end of 
a nucleic acid can be removed with polynucleotide 
5’-phosphatase 


5’-phosphopolynucleotide + H,O == 
polynucleotide + HOPO; 
(3-3) 


and the phosphate can be added back to the 5’-hydroxyl 
group of DNA, usually as a radioactive [”P]phosphate, 
with polynucleotide 5’-hydroxyl-kinase: 


ATP + 5’-dephospho-DNA == ADP + DNA 
(3-4) 


Both DNA ligase (ATP) 


ATP + DNA, + DNA,, == AMP + pyrophosphate + DNA). 
(3-5) 


and DNA ligase (NAD*) 
NAD* + DNA, + DNA, — 


AMP + nicotinamide nucleotide + DNA. 
(3-6) 


dATP 
dATP 
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Figure 3-10: Enzymes that are used to manipulate DNA. A double-stranded segment of DNA is 
represented diagramatically. In the core of the double helix, the bases are paired adenine next to 
thymine and cytosine next to guanine. The two antiparallel strands each have a 3’-end, at which 
there is usually a free 3’-hydroxyl, and a 5’-end, at which there is usually a phosphorylated 
5’-hydroxyl group. Polynucleotide 5’-phosphatase (phosphatase) is used to remove the phophoryl 
group from the 5’-hydroxyl group. Polynucleotide 3’-hydroxyl-kinase (kinase) is used to phos- 
phorylate a dephosphorylated 5’-hydroxyl group, usually with radioactive [y-**P]ATP. Either DNA 
ligase (ATP) or DNA ligase (NAD*) (ligase) can be used to join the 3’-hydroxyl group and the 
5’-phosphate at a single-strand break in the DNA between two nucleotides. The two nucleotides 
to be joined are usually held immediately adjacent to each other by their pairing with partners 
on the adjacent unbroken strand. Site-specific deoxyribonucleases are used to cleave double- 
stranded DNA on both strands. The cleavages occur within a sequence specific to the particular 
site-specific deoxyribonuclease chosen. The site-specific deoxyribonuclease PstI, used as an 
illustration, cleaves each strand between the A and the G in the sequence -CTGCAG- to produce 
the two complementary, sticky 3’-ends -TGCA. DNA-directed DNA polymerase (polymerase) 
elongates a strand of DNA, the primer, at its 3’-end by successively adding the nucleotides that 
pair with the consecutive bases on the opposite strand of the DNA, the template. The reactant 
for each step of the elongation is the deoxyribonucleoside triphosphate of the appropriate base. 
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repair single-stranded breaks in a double helix. The com- 
plementarity of the bases around the break usually juxta- 
pose the two ends to be ligated. 

Site-specific deoxyribonucleases and DNA ligases 
are used to insert one molecule of DNA into another 
molecule of DNA. The molecule to be inserted has been 
prepared by cleaving a longer molecule of DNA with a 
particular site-specific deoxyribonuclease, usually one 
that produces sticky ends. The molecule of DNA into 
which the restriction fragment is to be inserted is then 
cleaved with the same site-specific deoxyribonuclease to 
produce a break with the same sticky ends as those on 
the restriction fragment to be inserted. The two prepara- 
tions of DNA are then mixed, and the various sticky ends, 
for example, the two 3’ sticky ends, -TGCA, produced by 
Pstl (Figure 3-10), spontaneously pair up. The pairs of 
resulting offset breaks in the two strands of the double 
helices are then repaired with DNA ligase (Figure 3-10) to 
produce a new unbroken molecule of DNA in which the 
restriction fragment has been inserted into the other 
molecule of DNA at the restriction site specific to the site- 


specific deoxyribonuclease. The junctions that are 
effected by this procedure are at random, but the desired 
product in which the restriction fragment of DNA has 
been inserted into the middle of the other molecule of 
DNA are selected from the overall population. The site- 
specific deoxyribonuclease is often chosen because it 
produces blunt ends, which can be ligated at random 
with other blunt ends. If the restriction fragment to be 
inserted has been produced with one site-specific 
deoxyribonuclease and the molecule of DNA into which 
it is to be inserted has been cleaved with another, the 
sticky ends are usually removed and the pieces ligated at 
their blunt ends. 

Polymerases are enzymes that synthesize a new 
strand of nucleic acid in consecutive steps: 


nucleoside 5’triphosphate + nucleic acid, ==> 
pyrophosphate + nucleic acid,,,; 
(3-7) 


Polymerases usually require a particular arrangement of 
double-helical nucleic acid. There must be a shorter 
strand of nucleic acid, the primer, associated in a double 
helix through complementary base pairing with a longer 
strand of nucleic acid. The longer strand of nucleic acid, 
the template, must extend beyond the 3’-end of the 
primer. The polymerase elongates the primer from its 
3’-end by adding a nucleotide at each step that is com- 
plementary to the adjacent base on the template. DNA- 
directed DNA polymerase synthesizes a single strand of 
DNA that is complementary to a template of DNA and 
that remains associated with the template of DNA in a 
double helix. RNA-directed DNA polymerase synthe- 
sizes a single strand of DNA that is complementary to a 
template of RNA and that remains associated with the 
template of RNA in a double helix. The enzyme pairs A 
with U and T with A. DNA-directed RNA polymerase 
synthesizes a single strand of RNA that is complementary 
to a template of DNA but that does not remain associated 
with the template. It pairs A with T and U with A. The 
DNA polymerases use the 2’-deoxyribonucleoside 
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triphosphates as reactants; the RNA polymerases use, the 
ribonucleoside triphosphates. 

Polypeptides are synthesized biologically by ribo- 
somes that translate the sequence of nucleotides in a 
single-stranded messenger RNA (mRNA) into a 
sequence of amino acids in a polypeptide. The two cor- 
responding words written in the respective sequences 
are in the same language, the language of the structure of 
the protein, and they have the same spelling, but the 
alphabets are different. The alphabet of the polypeptide 
sequence consists of the 20 amino acids; the alphabet of 
the messenger RNA consists of triplets of nucleotides. 
The correspondence between the letters in the two 
alphabets is known as the genetic code. Each triplet 
specifies a particular amino acid, and the triplets are 
sequentially arranged in the same order as the amino 
acids of the protein encoded by the message (Figure 
3-11). Because the sequence of the nucleotides, however, 
is continuous and does not indicate how they are 
grouped as triplets, there are three ways to divide any 
sequence of nucleotides into triplets, or three distinct 
reading frames, only one of which encodes the sequence 
of the protein. If the sequence and the correct reading 
frame of a messenger RNA have been determined, it can 
be immediately translated on paper into the sequence of 
the polypeptide which it encodes. 

Messenger RNA is synthesized by DNA-directed 
RNA polymerase from a gene in the double-helical DNA 
of the genome of the organism. Its sequence matches 
that of one of the strands of DNA in the double helix, the 
sense strand, except that uridine monophosphate 
replaces thymidine monophosphate. During the synthe- 
sis of messenger RNA, the other strand of the DNA, the 
antisense strand, serves as the template (Figure 3-11). 
The sequence of the sense strand of a prokaryotic gene is 
identical to that of the messenger RNA transcribed from 
it, and the sequence of the protein encoded by that sense 
strand can be read directly from the sequence of the 
genomic DNA. The genomic DNA of eukaryotes, how- 
ever, contains introns. An intron is a segment of unre- 
lated DNA that has been inserted during evolution into 
the genomic DNA of the eukaryote and that interrupts 
the sequence on the sense strand that encodes the pro- 
tein. These introns are spliced out of the messenger RNA 


before it is read by the ribosome. Although they cause no 
problems for the organism, introns make it difficult if not 
impossible to read the sequence of a eukaryotic protein 
from the sequence of the gene that encodes the sequence 
of that protein. Consequently, it is the messenger RNA 
for a eukaryotic protein that must be sequenced. 

Almost every protein molecule present at a partic- 
ular time in a living cell is being continuously produced 
by ribosomes from messenger RNA molecules, and it 
follows that if a protein is found in a eukaryotic cell or 
tissue, the messenger RNA encoding it should be there 
as well. Messenger RNA can be isolated as a complex 
mixture of all of the messages normally being expressed 
in a particular tissue. This isolation is assisted by the fact 
that all eukaryotic messenger RNAs have a segment of 
poly(adenosine monophosphate) about 200 bases in 
length at their 3’-ends. Affinity adsorption with a sta- 
tionary phase to which poly(thymidine monophos- 
phate) has been attached covalently is used to separate 
the messenger RNA from all of the other RNA in the 
homogenate. 

The stratagem devised to obtain the nucleic acid 
sequence of a particular single-stranded messenger RNA 
in this purified mixture is to transcribe all of the single- 
stranded messenger RNAs in the mixture into a mixture 
of double-helical DNAs of the same respective sequences, 
separate these molecules of DNA biologically, select the 
DNA derived from the messenger RNA of interest, and 
sequence that DNA. Deoxyribonucleic acid that has the 
same sequence in one of its two complementary strands 
as the sequence of a messenger RNA is referred to as com- 
plementary DNA (cDNA). Messenger RNA is transcribed 
into complementary DNA in the laboratory by first using 
RNA-directed DNA polymerase to synthesize single- 
stranded DNA complementary in sequence to the mes- 
senger RNA. The single strands of DNA end up in hybrid 
double helices with the messenger RNAs. The RNA is then 
removed by digesting it with RNase, and then DNA- 
directed DNA polymerase is used to synthesize the com- 
plements to the single strands of DNA. Each strand of this 
newly synthesized DNA remains associated with its tem- 
plate in a double helix. In its sense strands, this double- 
helical DNA contains the original sequences of the 
messenger RNAs. One advantage of the complementary 


antisense 3’ CGAACCCGAATAAAGAGGCTGCAACTGGACCTTTTC 5’ 

sense 5’ GCTTGGGCTTATTTCTCCGACGTTGACCTGGAAAAG 3’ 

mRNA 5’ GCUUGGECUUAUJUCUCCGACGUUGACCUGGAAAAG 3’ 
amino-terminal AlaTrpAlaTyrPheSerAspValAspLeuGluLys carboxy-terminal 


DNA 


Figure 3-11: Relationships between the sense strand and the antisense strands of a segment of double-helical DNA and the messenger RNA 
and between the messenger RNA and the amino acid sequence encoded by that messenger RNA. In the messenger RNA, the amino acid 
sequence of the protein is encoded by triplets of bases. Each triplet of bases is a letter in the alphabet of the messenger RNA. The genetic code 
is the correspondence between each of these triplets and the amino acid it encodes. The amino acid is the letter in the alphabet of the pro- 
tein. The messenger RNA is synthesized by DNA-directed RNA polymerase using the antisense strand of the DNA as a template and has the 
same sequence of nucleotides as the sense strand, except that uridine monophosphate is in place of each thymidine monophosphate. When 
RNA-directed DNA polymerase synthesizes a single strand of complementary DNA using messenger RNA as a template, that single strand of 
DNA has the same sequence as the antisense strand from which the messenger RNA was synthesized. 


Cloning, Sequencing, Expressing, and Mutating of Deoxyribonucleic Acids 99 


DNA derived from a particular tissue is that it catalogs all 
of the genes that are being expressed in that tissue. 

To clone a particular segment of DNA is to insert 
that DNA, usually present in a complex mixture of other 
DNAs, into the DNA of a bacteriophage or a bacterium 
and then isolate a population of identical bacteria or 
identical bacteriophage, all of which carry just that one 
segment of DNA. A bacteriophage is a virus that infects 
and replicates within a bacterium. For the purposes of 
this discussion, the segment of DNA to be cloned is a seg- 
ment encoding a protein of interest. In the cloning of 
eukaryotic DNA encoding a protein, complementary 
DNA is usually the starting point because of the problem 
of the introns in the genomic DNA. Complementary DNA 
is also advantageous because tissues producing signifi- 
cant amounts of the protein can be chosen as the source 
for the messenger RNA, a strategy that increases the 
chances of finding its complementary DNA. In the 
cloning of prokaryotic DNA encoding a protein of inter- 
est, genomic DNA is usually the starting point™ because 
itis already double-helical DNA, and in prokaryotes there 
are no problems with introns. The genomic DNA of the 
bacterium is cut into restriction fragments long enough 
to contain all or most of the gene encoding the protein. 

The complementary DNAs or fragments of genomic 
DNA in one of these complex mixtures are then usually 
incorporated into a library in which they can be stored, 
replicated, and screened. A library is a large population 
of bacteriophage or bacteria, each of which contains 
within its DNA one of the complementary DNAs or frag- 
ments of genomic DNA from the original mixture just as 
the usual library contains a large population of different 
books. Each piece of foreign DNA is integrated into the 
DNA of one of the bacteriophage or one of the bacteria in 
the library in such a way that it is replicated along with its 
genomic DNA, ensuring that all of the progeny of that 
one bacteriophage or bacterium contain the inserted 
complementary DNA or genomic DNA. Each of the frag- 
ments of foreign DNA is inserted into the same location 
in the DNA of the bacteriophage or bacteria in the library. 

If the library consists of bacteriophage, the foreign 
DNA is inserted at the same site in the genomic DNA of 
each bacteriophage. These genomic DNAs containing 
the inserts can be biologically replicated to a high con- 
centration by infecting a suspension of bacteria, usually 
Escherichia coli, with the bacteriophage. 

Complementary DNAs or fragments of genomic 
DNA are incorporated into a population of bacteria by 
first inserting them into plasmids. A plasmid is a circular 
molecule of double-stranded DNA that is able to repli- 
cate independently of the chromosome in a bacterium. 
The species of bacteria usually used to carry a plasmid is 
E. coli. In addition to the inserted DNA, each of the vari- 
ous plasmids used for cloning contains a gene causing 
the bacterium that carries it to be resistant to a particu- 
lar antibiotic. Consequently, when the plasmids have 
been incorporated into a population of bacteria, only the 


bacteria carrying one of the plasmids will replicate in the 
presence of the antibiotic. In the library, each antibiotic- 
resistant bacterium contains a plasmid and most of the 
plasmids contain a copy of one of the original comple- 
mentary DNAs or fragments of genomic DNA. Not only 
do plasmids and bacteriophage permit the inserted DNA 
to be replicated as they themselves are replicated, they 
also provide a way of storing the inserted DNA, because 
once it has been incorporated into the bacteriophage or 
its plasmid has been incorporated into a bacterium, it is 
stable for long periods of time if the bacteriophage or 
bacterium is stored in its dormant state. 

Occasionally, the messenger RNA in a tissue pro- 
ducing mainly one protein is so enriched for the messen- 
ger RNA encoding that particular protein that the most of 
the individuals in the library carry complementary DNA 
for that one messenger RNA, and one of these can be 
picked out from the rest directly.” Usually, however, the 
library has to be screened to find an individual carrying 
the desired complementary DNA or fragment of genomic 
DNA. To screen a library is to isolate bacteriophage or 
bacteria that carry complementary DNA or fragments of 
DNA encoding one particular protein from the vast 
majority of the bacteriophage or bacteria that carry com- 
plementary DNA or fragments of genomic DNA encoding 
other proteins. 

When the library is stored in bacteriophage, a con- 
tinuous lawn of a particular bacterium growing on an 
agar plate is infected with a dilute solution of those bac- 
teriophage carrying the inserted DNA. Small circular 
holes or plaques appear in the lawn. Each plaque results 
from the infection and lysis by bacteriophage of the bac- 
teria that had been happily growing within the lawn. All 
of the bacteriophage in one of the plaques are offspring 
of a single bacteriophage from the original solution that 
fell upon the lawn at the position of the center of the 
plaque and then replicated outward by consecutive 
infections of the bacteria. Ultimately, each plaque con- 
tains millions of the progeny of that one bacteriophage, 
and each one of the progeny contains only the one 
inserted DNA its common ancestor contained. 

When the library is stored in plasmids in a popula- 
tion of bacteria, a suitably diluted suspension of those 
bacteria is spread on a plate of agar containing the 
antibiotic. Only bacteria containing plasmids can grow, 
and each of these replicates until a round colony of bac- 
teria appears on the plate at the location where the orig- 
inal one fell. Each of the bacteria in the colony contains 
a plasmid because it survived the antibiotic, and each of 
the plasmids within the same colony contains the same 
segment of inserted complementary DNA or genomic 
DNA because all the bacteria are offspring of the original 
one. 

Each plaque or each colony contains copies of a dif- 
ferent complementary DNA or fragment of genomic DNA 
or lacks an insert. The trick is to discover which of the 
plaques or colonies, respectively, clearly visible to the 
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naked eye but numbering in the thousands to hundreds 
of thousands, happens to contain the complementary 
DNA or genomic DNA that encodes the protein of inter- 
est. 

The most rapid and unambiguous method of 
screening is to synthesize chemically or biologically a 
fragment of radioactive single-stranded or double- 
stranded DNA, referred to as a probe, the sequence of 
one strand of which encodes the amino acid sequence of 
the protein of interest (Figure 3-11). When the double- 
helical DNA in a plaque or a colony containing that par- 
ticular short nucleic acid sequence is heated so that it 
unwinds and becomes single-stranded DNA, the 
sequence on the antisense strand or the sequences on 
the sense and the antisense strands that are complemen- 
tary to the sequence or the two sequences of the probe 
will become accessible for hybridization. Hybridization 
is the formation in the laboratory of double-helical DNA 
from two complementary single strands of DNA. Because 
hybridization is usually performed in a complex mixture 
of single-stranded DNAs such as the denatured DNA 
from a plaque or colony, the mixture is cooled slowly or 
annealed, to give the pairs of complementary single- 
stranded molecules of DNA enough time to find each 
other and form a double helix. If the clone contains 
sequences of DNA that are complementary to those of 
the probe, those sequences, after the DNA has been 
denatured and probe has been added, will hybridize with 
the sequences of the probe to form short segments of 
double-helical DNA, and in this way the probe is cap- 
tured. This trapping of the probe makes the plaque or the 
colony containing the desired complementary DNA 
radioactive, marking the position of the bacteriophage or 
bacteria carrying the desired complementary DNA and 
allowing that one plaque or colony to be isolated. 

An example” will illustrate this screening proce- 
dure. Factor VIII is one of the proteins that are together 
responsible for the cascade of events leading to the clot- 
ting of the plasma of mammalian blood. Human Factor 
VIII was digested with trypsin, and the peptides that 
resulted from the digestion were separated” on chro- 
matography by adsorption. Several of these peptides 
were resolved cleanly from their neighbors, and they 
were submitted to Edman degradation. The amino acid 
sequence determined for one of these peptides was 
AWAYFSDVDLEK. A segment of radioactive, single- 
stranded DNA with the nucleic acid sequence 
CTTTTCCAGGTCAACGTCGGAGAAATAAGCCCAAGC 
(Figure 3-11), one of the many possible antisense 
sequences to that encoding the peptide, was synthesized 
chemically to act as a probe. Long restriction fragments 
(15 kb) of human genomic DNA were inserted into bac- 
teriophage A Charon, and these bacteriophage were used 
to produce plaques on lawns of E. coli. The DNA in the 
plaques was then denatured. During subsequent anneal- 
ing and hybridization, the radioactive probe was cap- 
tured by the denatured, single-stranded DNA in 15 


plaques out of the 500,000 screened for DNA containing 
a nucleic acid sequence that would capture the probe.” 
Bacteriophage from each of these 15 clones were sepa- 
rately grown on a large scale, and the inserted DNA was 
cut out of the DNA of the bacteriophage that had been 
carrying it with site-specific deoxyribonucleases. 

The polymerase chain reaction‘® can be used to 
produce probes for screening plaques or colonies or even 
a segment of the DNA encoding a significant portion of 
the protein of interest. This is a method for replicating to 
a high concentration only a specific segment from any 
source of DNA. To replicate only a particular segment of 
DNA in a complex mixture or within a much longer mol- 
ecule of DNA by the polymerase chain reaction, all that is 
required is that the segment of double-stranded DNA to 
be replicated is flanked on either side by known 
sequences of nucleotides. Two short primers of single- 
stranded DNA are synthesized, one complementary to 
the flanking sequence at one end of the segment to be 
replicated and the other complementary to the flanking 
sequence at the other end. These two primers for the two 
ends, however, must be complementary to the 
sequences on opposite strands of the initial double- 
stranded DNA. The initial double-stranded DNA is 
melted, and the two primers are hybridized. DNA- 
directed DNA polymerase is then used to elongate from 
the 3’-end of each primer (Figure 3-10). This produces 
two copies of duplex DNA over the segment of interest. 
The new DNA is melted and rehybridized with the same 
two primers and elongation is performed again to pro- 
duce four copies of duplex DNA for the segment of inter- 
est and so forth. If the heat-stable DNA-directed DNA 
polymerase from Thermus aquaticus® or Pyrococcus 
furiosus” is used for the elongation, new polymerase 
does not have to be added after each melting cycle. After 
repeated cycles of melting, annealing, and elongation, 
essentially all of the newly synthesized DNA is a copy of 
the double-stranded segment of the original DNA 
between and including the sequences of the priming 
DNA, and the concentration of this segment increases 
exponentially with each step. 

An example of the use of this procedure is the syn- 
thesis of a probe for screening a library containing the 
gene for extensin from Volvox carteri.” The amino- 
terminal sequence of the protein, AVSYSVSVYNNIAVT- 
GAP-, and the sequence of a tryptic peptide from the 
protein, IDPPSNFGNLPVK, were used to guide the syn- 
thesis of two primers, GT(T/C/A/G)TA(T/C)AA(T/C)- 
AA(T/C)AT(T/C/A)GC and GG(G/T)AGGTT(T/C/A/G)- 
CCGAA(G/A)TT, where letters in parentheses indicate 
that two or more nucleotides were coupled in that step of 
the synthesis to allow for the redundancy of the genetic 
code. When complementary DNA from sperm packets of 
V. carteri was amplified with these primers in a poly- 
merase chain reaction, a segment of 410 bp of double- 
stranded DNA was produced beginning with the 
sequence GTCTACAACAACATCGC- and ending with the 
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sequence -AACTTTGGCAACCTGCC on its sense strand. 
This sequence encoded 136aa of the amino acid 
sequence of the protein. This segment of DNA was 
inserted into a plasmid, replicated with radioactive pre- 
cursors, and used successfully as a probe to screen a 
library of genomic DNA from V. carteri. By use of this 
probe, a clone of bacteria was identified that carried a 
plasmid containing a segment of complementary DNA 
1392 bp long, encoding 464 aa from the amino acid 
sequence of extensin. 

The complementary DNA or the fragment of 
genomic DNA encoding the protein of interest that has 
been produced by replicating the bacteriophage or bac- 
terium identified by the screen, or the segment of DNA 
encoding a portion of the protein that has been amplified 
by the polymerase chain reaction, can be quite long, 
from thousands to tens of thousands of nucleotides. The 
sequence of a particular piece of single-stranded DNA 
can be read only to a certain length (300-400 
nucleotides). Therefore, long DNAs must be cleaved into 
smaller restriction fragments with site-specific deoxyri- 
bonucleases, just as polypeptides have to be cleaved into 
peptides before they can be sequenced. By trial and 
error, a pattern of restriction fragments ideally suited to 
the demands of sequencing can be prepared. 

The shorter double-helical restriction fragments 
produced from a longer double-helical DNA are rapidly 
separated by preparative electrophoresis on gels of 
agarose. They are usually visualized by use of fluorescent 
dyes. Their length can be estimated from their elec- 
trophoretic mobilities. The order in which a given set of 
restriction fragments is arranged in the original DNA is 
determined by restriction mapping. To produce a 
restriction map of a large piece of DNA, it is cleaved sep- 
arately with several site-specific deoxyribonucleases. 
The restriction fragments produced in each of these sep- 
arate digestions are isolated and assigned a length by 
electrophoresis. Each of these restriction fragments of 
DNA is then submitted to digestion by the other sets of 
site-specific deoxyribonucleases, and the shorter restric- 
tion fragments that result are separated and assigned a 
length. This dissection is continued until the restriction 
fragments observed, which are designated by the pedi- 
gree of the cleavages that produced them, are consistent 
with only one distribution of restriction sites through the 
original piece of long DNA as well as being of the desired 
length. This unique distribution of restriction sites, the 
restriction map, orders the different restriction frag- 
ments that have been obtained relative to the complete 
sequence. 

An example will serve to illustrate the complete 
process.” A clone containing the complementary DNA 
encoding the a polypeptide of the murine nicotinic 
acetylcholine receptor within the tetracycline-resistant 
plasmid pBR322 (Figure 3-12)” was identified by screen- 
ing. The cloned complementary DNA was cut from the 
plasmid as an intact double-helical polymer with the 


site-specific deoxyribonuclease PstI, which cleaves at 
CTGCAIG. This DNA was digested with the following 
site-specific deoxyribonucleases: Alul, which cleaves at 
the nucleic acid sequence AGICT; TaqI, which cleaves at 
TICGA; Hpall, which cleaves at CCGG; Haelll, which 
cleaves at GGICC; Rsal, which cleaves at GTJAC; and 
Hincll, which cleaves at GTPy!PuAC, where Py is either 
pyrimidine and Pu is either purine. 

The pattern of restriction fragments obtained when 
these enzymes were used in various combinations was 
consistent with only one restriction map (Figure 3-12). 
For example, the Hpall restriction fragment between 
positions 478 and 1063 would give three restriction frag- 
ments about 60, 240, and 280 base pairs in length upon 
digestion with site-specific deoxyribonuclease Alul. The 
order in which these three subfragments occur in the 
Hpall restriction fragment could be determined by gath- 
ering the following observations. Deoxyribonuclease 
Taq! would cut only the Alul restriction fragment that is 
about 280 base pairs in length to yield the same restric- 
tion fragment, about 120 base pairs long, that it would 
produce from one end of the Hpall restriction fragment. 
Deoxyribonuclease HinclI would cut only the Alul 
restriction fragment that is about 240 base pairs in length 
to give a restriction fragment about 140 base pairs in 
length. This restriction fragment, together with the Alul 
restriction fragment about 60 base pairs in length, would 
form the restriction fragment about 200 base pairs in 
length produced during the digestion of the Hpall 
restriction fragment with deoxyribonuclease Hincll 
alone. 

When restriction fragments of a convenient size 
had been produced from this complementary DNA 
encoding the a polypeptide of the murine acetylcholine 
receptor, a group of single-stranded DNAs within the set 
were chosen for sequencing (arrows in Figure 3-12). 
These single-stranded DNAs were subcloned in the 
single-stranded bacteriophage M13, and each was sub- 
mitted to sequencing from its 5’-end. 

The property of denatured, single-stranded nucleic 
acids that allows them to be sequenced is that they 
behave with extraordinary regularity upon elec- 
trophoresis. For example, when 4.55 ribosomal RNA 
from the chloroplasts of spinach,” which is 107 bases 
long, is elongated with RNA ligase (ATP) from T4 bacte- 
riophage by one nucleotide at its free 3’-hydroxyl group 
by use of [5’-”Plcytidine 3’,5’-bisphosphate and then 
submitted to partial alkaline hydrolysis, a random mix- 
ture of fragments of all possible lengths and all possible 
beginning and ending points within the sequence is pro- 
duced. Only those fragments that begin at the original 
3’-end, however, are radioactive. In the case of the 
4.5S rRNA, these formed a set of 108 unique fragments 
that were of all the possible lengths between 1 and 108 
nucleotides. When this mixture was submitted to elec- 
trophoresis under denaturing conditions on a gel cast 
from 12% acrylamide and the radioactive components 
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Figure 3-12: Restriction map of a fragment of DNA cut out of a plasmid.” A large fragment of complementary DNA (17 kb) was removed 
with the site-specific deoxyribonuclease PstI from the circular plasmid pMAR@15, which had been originally constructed from the circular 
plasmid pBR322. The plasmid contained a gene for resistance to the antibiotic tetracycline (TET) so that only bacteria carrying the plasmid 
would grow on a medium containing tetracycline. The origin of replication for the plasmid is indicated (O/I). The plasmid pMARg15 was iso- 
lated during a screening procedure for complementary DNA encoding the a-polypeptide of the murine acetylcholine receptor. The fragment 
of complementary DNA was purified by electrophoresis and submitted to a series of digestions with the noted site-specific deoxyribonucle- 
ases. The patterns of fragments established the restriction map displayed. The arrows below the restriction map indicate which restriction 
fragments were submitted to sequencing from which 5’-end. The positions in the nucleic acid sequence cleaved by each site-specific deoxyri- 
bonuclease are identified by numbers in parentheses. Reprinted with permission from ref 72. Copyright 1985 Oxford University Press. 


were located by placing the polyacrylamide gel on a pho- 
tographic film, a regular array of bands, referred to as a 
ladder, could be observed (Figure 3-13). Each of these 
bands, with one interesting exception that will be dis- 
cussed later, represents a single-stranded RNA that 
begins at the labeled 3’-end of the original 4.5S rRNA, 
because it is radioactive and is one nucleotide longer 
than the nucleic acid in the band below it in the figure. 
The ability of electrophoresis on polyacrylamide 
gels to separate nucleic acids only on the basis of their 
length arises from the properties of these polymers and 
the nature of the electrophoresis. The free elec- 
trophoretic mobility of denatured single-stranded DNA 
at I, = 0.01 M, pH 7.5, and 0 °C is (1.82 + 0.02) x 10 cm? 
V's and does not vary” with its length. The free elec- 
trophoretic mobility of denatured RNA under the same 
conditions is the same, (1.77 + 0.05) x 10“ cm? V's", and 
it also shows no tendency to vary with length.” The elec- 
trophoretic mobilities of single-stranded DNA and RNA 
on polyacrylamide gels also conform to Equation 1-81,” 
and the free mobilities extrapolated from their behavior 
on polyacrylamide gels are in reasonable agreement with 
those measured directly.” Because their free elec- 
trophoretic mobilities are all the same, it is only the 
resistance posed by the polyacrylamide, exp(-K,,,T,), that 
separates the nucleic acids of the various lengths. It is not 


surprising that this sieving, accomplished at the molecu- 
lar level by the strands of polyacrylamide, should be a 
regular, continuous, monotonic function of the lengths 
of the nucleic acids (Figure 3-13). 

Suppose that a single-stranded deoxyribonucleic 
acid, labeled at its 5’-end by phosphorylation with 
[**P] phosphate, has been cleaved in a low yield and ran- 
domly on the 5’-side of each of the deoxyguanosines in 
its sequence. This partial cleavage will have produced a 
series of radioactive fragments of different length, each 
of which ends at a nucleotide whose only distinction is 
that it preceded a deoxyguanosine in the original 
sequence. When the products of this partial cleavage are 
submitted to electrophoresis, a series of radioactive 
bands will appear the mobilities of which correspond to 
only those rungs in the ladder the 3’-terminal nucleotide 
of which precedes a deoxyguanosine. The knowledge 
that the cleavage occurred only at deoxyguanosines and 
the position of the products in the ladder identifies the 
relative positions of every deoxyguanosine in the original 
sequence. 

Suppose further that four samples have been pre- 
pared from the original single-stranded deoxyribonu- 
cleic acid such that they contain radioactive fragments, 
all of which begin at the original 5’-end because they 
were made radioactive by phosphorylating only that 
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Figure 3-13: Separation of fragments of end-labeled RNA by elec- 
trophoresis on gels of polyacrylamide.” 4.58 Ribonucleic acid, iso- 
lated from spinach chloroplasts, was labeled at its 3’-end with 
[5°-°P] cytidine 3’,5’-bisphosphate by T4 RNA ligase (ATP). The 
end-labeled RNA was partially digested under alkaline conditions 
and then submitted to electrophoresis on slabs of 12% polyacryl- 
amide cast in a buffered solution of 7 M urea. The two lanes in the 
figure were loaded with different amounts of sample and were run 
for different lengths of time. Reprinted with permission from ref 73. 
Copyright 1982 Journal of Biological Chemistry. 


location, but which end at every nucleotide preceding a 
deoxyadenosine in one sample, preceding a 
deoxyguanosine in another sample, preceding a deoxy- 
cytidine in a third sample, or preceding a thymidine in 
the fourth sample. When these samples are submitted to 
electrophoresis, side by side, every band in the ladder 
will be represented in the four lanes, but each band in the 
ladder will be found in only one of the lanes. As one 
scanned the pattern, from the bands of greatest mobility 
to the bands of least mobility, one would encounter each 
band of the ladder in its proper succession. The lane in 
which each successive band was found would have been 
determined by the identity ofthe nucleotide that follows 
its actual 3’-terminal nucleotide in the complete 
sequence of the original DNA. By starting with the band 
of greatest mobility and noting its lane and the lane in 
which each successive band of lower mobility occurs, 


one would be reading the sequence of the DNA in the 
direction from 5’ to 3’. 

The strategy for sequencing DNA illustrated by this 
simplified situation requires that a set of end-labeled 
fragments of single-stranded DNA be produced. Each of 
these fragments must have as its 5’-terminus the same 
nucleotide in the nucleic acid sequence to be deter- 
mined, but this does not have to be the actual 5’-termi- 
nus ofthe original piece of DNA. For example, this result 
could be achieved by cleaving all ofthe molecules of the 
original DNA at the same nucleotide with a site-specific 
deoxyribonuclease. Every position in the portion of the 
complete nucleic acid sequence to be determined from 
the four lanes ofa particular polyacrylamide gel must be 
represented by a labeled fragment that ends at this posi- 
tion and that has been produced in sufficient yield to be 
visualized. The observer must have enough information 
about each fragment visualized to associate its 3’-termi- 
nus with a particular nucleoside, deoxyadenosine, 
deoxyguanosine, deoxycytidine, or thymidine. In prac- 
tice, this information is either that the 3’-terminus of a 
particular fragment precedes a particular nucleotide in 
the complete nucleic acid sequence or that its 3’-termi- 
nus is a particular nucleotide. There are two methods, 
chemical and enzymatic, for producing such a set of frag- 
ments. Neither corresponds exactly to the simplified 
illustration just described, but both satisfy the require- 
ments of the strategy. 

In the chemical method of Maxam and Gilbert,” 
reagents that take advantage of the hybrid nature of the 
nucleotide bases, which are partly aromatic heterocycles 
and partly acyl derivatives, are used to cleave chemically 
the single-stranded DNA, labeled at its 5’-terminus, at 
locations occupied by a particular base. The chemical 
cleavages used are based on reactions previously devel- 
oped to remove selectively either purine bases or pyrim- 
idine bases from DNA. Such reactions are depurin- 
ations” or depyrimidinations,” respectively. Reagents 
are used to depurinate the single-stranded DNA prefer- 
entially at deoxyguanosine®”*' or deoxyadenosine’””® or 
to depyrimidinate single-stranded DNA selectively at 
both deoxycytidine and thymine” or preferentially 
at deoxycytidine.” A position in DNA that has lost its 
nucleoside base by depurination or depyrimidination 
is susceptible to hydrolysis in base"! while normal 
DNA is not. In preparation for sequencing, the DNA is 
partially depurinated or depyrimidinated, respectively, 
at locations the identity of which has been controlled 
by the conditions of these reactions and is then 
hydrolyzed at each of these locations by treatment with 
base to produce fragments that have as their 3’-terminus 
a nucleotide that preceded, in the original nucleic 
acid sequence, a target for the depurination or 
depyrimidination. 

In the enzymatic method of Sanger, Nicklen, and 
Coulson,” the properly terminated fragments required 
for the electrophoresis are made by synthesizing com- 
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plementary strands of DNA by use of the single-stranded 
DNA to be sequenced as a template in four separate elon- 
gations catalyzed by DNA-directed DNA polymerase 
(Figure 3-10). The nucleotides inserted by the poly- 
merase are present in solution as their activated 
5’-triphosphates. In the original method, the newly syn- 
thesized polymer of DNA is made radioactive by includ- 
ing [@-”P]dATP in the synthetic mixture. The successive 
fragments that have at their 3’-end only a particular 
nucleotide are produced by including a small amount of 
3’-deoxythymidine triphosphate, 2’,3’-dideoxycytidine 
triphosphate, 2’,3’-dideoxyguanosine triphosphate, or 
2’,3’-dideoxyadenosine triphosphate, each in one of the 
four elongations, along with the thymidine triphosphate, 
2’-deoxycytidine triphosphate, 2’-deoxyguanosine 
triphosphate, and 2’-deoxyadenosine triphosphate pres- 
ent in all of them. Occasionally, a 2’,3’-dideoxynu- 
cleotide is incorporated into one of the growing 
polymers by the DNA-directed DNA polymerase, and its 
incorporation terminates polymerization because that 
polymer then lacks the 3’-hydroxyl group necessary for 
further elongation. In this way fragments satisfying two 
of the requirements for electrophoretic sequencing are 
produced. 

The last requirement, that every fragment have as 
its 5’-terminus the same position in the complete 
sequence, is satisfied by taking advantage of the require- 
ment of DNA-directed DNA polymerase for a primer to 
provide a 3’-hydroxyl group from which the new strand 
can be elongated. To initiate the reaction, a primer that 
is complementary to a segment of the DNA to be 
sequenced is annealed to the template to provide the 
necessary 3’-hydroxyl group. Because the DNA-directed 
DNA polymerase starts at the primer when it synthesizes 
a complementary, radioactive single strand of DNA, the 
sequence of the primer can be chosen so that the newly 
synthesized DNA will begin at a particular point in the 
sequence of the template. The complementary sequence 
to which the primer is annealed can be a short piece of 
DNA of known sequence that has been deliberately 
attached to the 3’-end of the DNA to be sequenced,” or 
it can be any internal sequence for which a complemen- 
tary fragment of single-stranded DNA happens to be 
available. Often this complementary fragment is a 
probe that had been made for purposes of screening. It is 
also possible to use an oligonucleotide that has the same 
sequence as a segment near the 3’-end of the longest 
single-stranded fragment that provided readable 
nucleotide sequence in the last set of polyacrylamide 
gels, to extend the sequencing of the template further to 
its 5’-end. In this way, one can walk along a long template 
and read its entire sequence. 

The polyacrylamide gels that result from the appli- 
cation of these two methods, the chemical and the enzy- 
matic, are similar in appearance (Figure 3-14A, BILD 
Sequence is read from the bottom (shortest fragments) to 
the top (longest fragments), 5’ to 3’. In the chemical 


method the sequence of the original single-stranded 
DNA is being read. In the enzymatic method the 
sequence of the complement of the original single- 
stranded DNA is being read. Since DNA is normally 
double-helical with two antiparallel strands of comple- 
mentary sequence, either sequence is formally the 
sequence of the DNA, as long as the correct direction 
(5’ — 3’) is assigned to the sequence by the observer. 

These original methods, the chemical and the enzy- 
matic, were both based on the use of fragments of 
nucleic acid made radioactive by incorporating 
[**P]phosphate (Figure 3-14A, B), but in the automated 
DNA sequencers currently in use, end-labeled fluores- 
cent fragments of nucleic acid are used. Although chem- 
ical methods have been developed?” that may eventually 
be more efficient, the current automated procedures for 
sequencing DNA are based on the original enzymatic 
method of Sanger, Nicklen, and Coulson.® The products 
of the terminations by the dideoxynucleotides are all 
separated together on the same gel of polyacrylamide, 
which is continuously scanned by a fluorometer.®® The 
products from the four respective termination reactions 
are end-labeled with four different fluorescent dyes that 
can be distinguished by the fluorometer on the basis of 
the colors of their fluorescence. The separate fluorescent 
tags are applied one of two ways. 

Synthetic derivatives of 2’,3’-dideoxyadenosine 
triphosphate, 2’,3’-dideoxyguanosine triphosphate, 
3’-deoxythymidine triphosphate, and 2’,3’-dideoxycyti- 
dine triphosphate have been prepared that each have a 
different fluorescent dye covalently attached to their het- 
erocyclic bases.°”” When these derivatives are used to 
terminate the single-stranded fragments and thereby 
label them at their 3’-ends, the fluorometer can distin- 
guish strands of DNA terminated at deoxyadenosines, 
deoxyguanosines, thymidines, or deoxycytidines from 
each other by their differences in fluorescence. 

Alternatively, four distinguishable fluorescent dyes 
can be attached separately to the 5’-ends of four identi- 
cal samples of the primer that will be used,® and a dif- 
ferent one of the resulting fluorescent primers can be 
used in each of the four termination reactions. When the 
separate dideoxy terminations have been completed, the 
products of the four reactions are mixed. When the mix- 
ture is separated by electrophoresis, the fluorometer dis- 
tinguishes each strand by the color of the fluorescence 
emitted by the fluorescent dye on its 5’-end, which iden- 
tifies the termination mixture in which it arose. 

The DNA-directed DNA polymerase used in these 
automated sequencers is an improved version. The orig- 
inal enzyme used, the Klenow fragment of DNA-directed 
DNA polymerase from E. coli, terminates the elongation 
at each position with a different yield that can vary sig- 
nificantly (Figure 3-14B). This variability can result in 
uncertainty in reading the sequence, especially when it is 
to be read by a machine. DNA-directed DNA polymerase 
from bacteriophage T7 produces a much more uniform 
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tion of 7M urea. The panel of four lanes on the left is an — GA 
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9h, and the panel of four lanes on the right is an autoradi- 

ogram for identical samples submitted to electrophoresis for 

33 h. The sequence noted on the autoradiogram on the left, -TTTCCCGACTGG-, continues on the autoradiogram on the right.” Reprinted 
with permission from ref 77. Copyright 1977 National Academy of Sciences. (B) A fragment of DNA complementary to a short segment of the 
nucleic acid sequence of the single-stranded DNA from the bacteriophage @X174 was annealed to the template as a primer. The initiation 
complex was divided into four equal portions. Elongation of the primer in these separate samples was performed with DNA-directed DNA 
polymerase in the presence of 2’,3’-dideoxyguanosine triphosphate (lane G), 2’,3’-dideoxyadenosine triphosphate (lane A), 3’-deoxythymi- 
dine triphosphate (lane T), and 2’,3’-dideoxycytosine triphosphate (lane C). Each sample contained a small amount of [a-’P|MgATP to 
render the newly synthesized DNA radioactive. Following these respective reactions, each sample was digested with the Haelll site-specific 
deoxyribonuclease, which cut the DNA within the primer, so that all of the newly synthesized DNA would start with the same nucleotide at 
the 5’-end. The single-stranded, radioactive fragments of DNA in each sample were separated by electrophoresis on a slab of 12% polyacry- 
lamide.® Reprinted with permission from ref 85. Copyright 1977 National Academy of Sciences. (C, D) A segment of single-stranded DNA 
2707 bases long from the bacteriophage T7 was cloned in bacteriophage M13. The single-stranded genome of the M13 bacteriophage carry- 
ing the insert was used as template, and a short segment of synthetic DNA complementary to a region adjacent to the insert was annealed to 
the template as a primer. This initiation complex was labeled radioactively by mixing it with low concentrations (0.3 uM)of dGTP, TTP, dCTP, 
and (a-[*°S]thio)dATP, as well as DNA-directed DNA polymerase from bacteriophage T7. After 5 min at room temperature, this mixture was 
divided into four equal portions to which were added high concentrations (150 uM) of dATP, dGTP, dCTP, and TTP. To each portion was 
added one of the dideoxynucleotides at 15 uM: 2’,3’-dideoxyguanosine triphosphate (lanes G), 2’,3’-dideoxyadenosine triphosphate (lanes 
A), 3’-deoxythymidine triphosphate (lanes T), or 2’,3’-dideoxycytidine triphosphate (lanes C). The reaction was initiated by adding a high 
concentration of DNA-directed DNA polymerase from T7 bacteriophage. After 5 min at 37°C, the four samples were prepared for elec- 
trophoresis, the single-stranded radioactive fragments of DNA in each sample were separated on slabs of 7% polyacrylamide run for 12 h (C) 
or 2 h (D), and the gels were submitted to autoradiography. The numbers to each side indicate the lengths of the single-stranded fragments.” 
Reprinted with permission from ref 91. Copyright 1987 held by authors. 
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yield of fragments terminated by 2’,3’-dideoxynu- 
cleotides (Figure 3-14C, D).”! Certain mutants of the 
enzyme from E. coli are even more reliable than their 
parent.” 

The electrophoretic separations presented in 
Figure 3-13 illustrate an important artifact common to 
all methods of sequencing DNA. The thick band at the 
eleventh rung of the ladder on the electrophoretogram to 
the right is a compression. Within that one band, single- 
stranded RNAs from 24 to 27 nucleotides in length comi- 
grate. The band above the compression is the 
single-stranded RNA 28 nucleotides long; and that 
below, 23 nucleotides long.” A compression usually 
occurs when the 3’-end of the fragment of single- 
stranded nucleic acid is rich in G and C, and it is caused 
by structures formed intramolecularly among these 
bases by the usual GC pairing.’ When a compression is 
not recognized as such by a person or by a machine, it is 
mistaken for a normal band representing a single 
polynucleotide and the sequence read will be missing 
one or more nucleotides. It is possible to eliminate such 
compressions by using 2’-deoxyinosine triphosphate 
instead of 2’-deoxyguanosine triphosphate in the elonga- 
tion mixtures of the enzymatic method.” 

Once the sequence of a segment of prokaryotic 
genomic DNA or eukaryotic complementary DNA is in 
hand, the complete sequence of the protein can be read 
from the open reading frame that encodes it. The 
sequence of nucleotides in the messenger RNA that 
encodes a polypeptide begins with the initiation codon 
-AUG-, which can be recognized in the sequence by its 
proximity to sequences encoding a binding site for a 
ribosome, and ends with a termination codon, -UAA-, 
-UGA-, or -UAG-. An open reading frame (ORF) is any 
sequence of nucleotides that begins with an initiation 
codon and ends with a termination codon. Because a 
sequence of DNA does not indicate the reading frame 
used to synthesize the protein nor which of the two com- 
plementary strands is the sense strand (Figure 3-11), 
there are six possible sequences of triplets that could 
encode the protein. These six different reading frames in 
any sequence of nucleotides obtained experimentally 
each contain open reading frames, and the open reading 
frame encoding an actual protein can be recognized only 
if some information about the protein is known, for 
example, its length or sequences from some of its pep- 
tides. 

In the case of the a polypeptide of murine nicotinic 
acetylcholine receptor, each sequence of an individual 
single strand of DNA from the restriction mapping 
(Figure 3-12) began at the 5’-end of one of the two com- 
plementary strands of a double-helical fragment and was 
read as far as was possible. With the exception of two 
short segments, each region of the sequence was read at 
least twice. Together, all of these individual sequences 
produced the complete sequence of the complementary 
DNA (Figure 3-15). Of the six reading frames in the 


completely sequenced double-helical complementary 
DNA, the one containing the open reading frame encod- 
ing the sequence of the o polypeptide of murine nico- 
tinic acetylcholine receptor was easily identified by 
locating the one that encoded the amino acid sequences 
on which the probes used to screen the clones were 
based. 

In the case of human Factor VIII, when the 
sequences of the different segments of DNA identified by 
the screen were translated into amino acid sequences, 
each segment was found to contain an overlapping 
region of the same open reading frame that encoded the 
sequence AWAYFSDVDLEK. The amino acid sequences 
of four of the other peptides of factor VIII that had been 
submitted to Edman degradation could also be found in 
the translation of this one complete open reading frame 
found in the nucleic acid sequences of the overlapping 
clones. Comparisons like these between directly deter- 
mined amino acid sequences of a particular protein or its 
experimentally determined composition of amino 
acids”” and the amino acid sequence translated from 
an open reading frame in the complementary DNA or 
genomic DNA are often used to substantiate the identifi- 
cation of the complementary DNA or genomic DNA as 
that encoding a particular protein. 

The sequences of peptides from the protein permit 
the proper open reading frame to be assigned, but if 
bases have been omitted by mistake during the sequenc- 
ing of the nucleic acid, these omissions can produce a 
frameshift. A frameshift is the inadvertent shift into 
another reading frame as the sequence of nucleotides is 
being divided into triplets. It is caused by the omission of 
3n+ 1 or 3n+ 2 bases in the sequence. One common mis- 
take leading to a frameshift is the omission of several 
bases in the sequence that results when a compression 
goes unrecognized.” It is not unusual for an initial DNA 
sequence to contain errors, often quite a few,” but they 
are usually recognized and corrected by sequencing both 
the complementary strands or during further examina- 
tions of the protein it encodes.?””* 

When amino acid sequences of proteins were 
determined directly, the proteins chosen for sequencing 
were usually enzymes, and the sequences obtained were 
unremarkable. When the explosion of amino acid 
sequences from sequencing DNA commenced, so many 
more proteins were being sequenced that peculiar ones 
were discovered. The most obvious peculiarities were 
enrichments in a particular amino acid, such as a 
cell wall protein (465 aa) containing 60% glycine,” a 
127 aa segment of chicken vitellogenin containing 
75% serine,! a 655 aa segment of spider dragline silk 
containing 47% glycine and 28% alanine,” a 246 aa seg- 
ment of murine MP-2 containing 46% proline and 
17% glutamine,” a 71 aa segment of histidine-proline- 
rich glycoprotein containing 50% histidine,” a 30 aa 
segment of the Abdominal-B domain of the bithorax 
complex of Drosophila melanogaster containing 22 glut- 
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Figure 3-15: Nucleic acid sequence and deduced amino acid sequence for the œ polypeptide of murine nicotinic acetylcholine receptor.” 
The nucleotides are presented in the 5’ to 3’ direction for the coding strand. Both sequences are numbered starting with the first amino acid 
in the mature protein. The first eight amino acids in the presented sequence are removed posttranslationally. The initiation codon for trans- 
lation was not on the cloned piece of complementary DNA. The asterisk marks the codon at which translation is terminated. The restriction 
sites that produced the restriction map (Figure 3-12) are identified in the nucleic acid sequence. The PstI restriction sites situated at the two 
ends of the insert that were used to remove complementary DNA from the plasmid (Figure 3-12) are lost during the insertion of the restric- 
tion fragments of the complementary DNA into the M13 bacteriophage, but the sequence shown begins just after the initial PstI site and ends 
just before the final PstI site. Reprinted with permission from ref 72. Copyright 1985 Oxford University Press. 
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amines,! or a stretch of up to 37 successive glutamines 
in normal human HD protein.'°'® A protein induced 
by abscisic acid in maize has a segment 66 aa long con- 
taining only arginine, tyrosine, and glycine in which the 
sequence GGYGG repeats 7 times,” and the repeating 
unit of rat filaggrin (406 aa) containing 15% glutamine 
but no asparagine, 14% arginine but no lysine, 2% 
isoleucine but no leucine, no cysteine or methionine, 
and no tryptophan or phenylalanine.'!°'"’ In many of 
these enriched proteins, strings of 5-20 aa in length con- 
taining only one amino acid are common. Most of these 
proteins form flexible polymeric solids with physical 
properties appropriate to their function, and their pecu- 
liar sequences provide an intuition of the way in which 
they accomplish their roles. 

The large majority of proteins, however, have 
sequences that in isolation say nothing about their struc- 
ture or function, but the large number of sequences that 
have become available as a result of sequencing DNA 
permit proteins to be grouped into families. By the struc- 
ture and function of its relatives, the structure and func- 
tion of an unknown protein can often be revealed. The 
most obvious instance of such grouping of amino acid 
sequences is when a protein isolated from one organism 
is recognized to be the same protein, albeit with a some- 
what different sequence, as one that is found in other 
species." The most dramatic examples of such compar- 
isons of amino acid sequences are those in which a pro- 
tein responsible for one function is isolated and found to 
be identical to a protein isolated from the same species 
on the basis of its responsibility for another func- 
tion 113114 

The amino acid sequences of all the proteins in 
many prokaryotic species are now available. Complete 
genomic sequences have been determined for a 
number of those that are widely used in experimenta- 
tion. These genomic sequences permit the amino acid 
sequence of a prokaryotic protein to be obtained with- 
out the need for sequencing its DNA. If enough amino 
acid sequence has been obtained from a protein to con- 
struct a probe to screen a library, one has enough amino 
acid sequence to search by computer the complete 
genome for the species from which the protein was iso- 
lated, identify the gene encoding that protein, and 
thereby obtain its complete amino acid sequence. 
Unfortunately, the existence of introns in the genes 
encoding eukaryotic proteins usually makes it impossi- 
ble to obtain an accurate amino acid sequence of an 
unknown protein from the genomic sequence of the 
eukaryote from which it is derived. If, however, a set of 
already known amino acid sequences of proteins closely 
related to the protein of interest can be assembled, these 
sequences can be often be used to define the bound- 
aries of the introns in the gene encoding the protein of 
unknown sequence." If the boundaries of the introns 
can be defined, the sequence of the protein can be read 
from the genomic DNA. 


Once complementary DNA, if the protein is 
eukaryotic, or genomic DNA, if the protein is prokary- 
otic, encoding the complete, uninterrupted amino acid 
sequence of a protein has been cloned, it is possible to 
use that DNA to direct the production of the protein. 
This strategy provides a convenient and abundant 
source of the protein. It is also possible to cut out any 
piece of that DNA with the proper site-specific deoxyri- 
bonucleases and direct the production of the fragment 
of the protein encoded by the resulting fragment of the 
DNA. This strategy provides precisely designed pieces of 
the protein. 

An expression system is any process by which for- 
eign DNA encoding a protein of interest or a portion of 
that protein has been incorporated into a population of 
living cells and those cells have been induced to tran- 
scribe that foreign DNA into messenger RNA and trans- 
late that messenger RNA into usable quantities of the 
protein for which it encodes. The cells expressing the 
protein are usually not of the same species or even king- 
dom of the species from which the protein was first puri- 
fied. Escherichia coli, a bacterium, is the organism most 
widely used to express proteins, often those from 
animals. For example, complementary DNA encoding 
the 5-aminolevulinate synthase from Mus musculus has 
been expressed in cells of E. coli, and when these bacter- 
ial cells were harvested, 50% of the protein in them was 
murine 5-aminolevulinate synthase. Each liter of culture 
medium yielded bacterial cells from which 5 mg of the 
pure enzyme could be isolated.'!° Because the expres- 
sion system is usually unrelated to the organism from 
which the DNA to be expressed was originally derived, 
that expression system is usually unable to splice introns 
out of the messenger RNA encoded by genomic DNA. 
Consequently, if a eukaryotic protein or a portion of a 
eukaryotic protein is to be expressed, its complementary 
DNA is used. 

An expression vector is a molecule of DNA into 
which the DNA encoding the protein to be expressed is 
inserted and which compels the cells of the expression 
system to express the protein. When proteins are 
expressed in E. coli, the expression vectors are usually 
plasmids. There are many plasmids used as expression 
vectors, but each of them usually contains a gene con- 
veying resistance to an antibiotic so that only cells carry- 
ing the plasmid will grow. The insertion is performed at 
a restriction site that occurs only once in the sequence of 
the DNA for the plasmid. The insertion is performed by 
cleaving that site with the appropriate site-specific 
deoxyribonuclease, adding the fragment of DNA encod- 
ing the protein, and ligating the pieces of DNA. The 
expression vector has been designed so that this restric- 
tion site for insertion is immediately adjacent to 
sequences of DNA that enforce the transcription and 
translation ofthe inserted DNA. To guarantee high levels 
of transcription, there is a strong promoter, for example, 
a T3 promoter, a T7 promoter, a lacZ promoter, an alka- 
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line phosphatase promoter, a tacII promoter, or a trc pro- 
moter. These promoters are segments of DNA that serve 
as unusually active sites for the initiation of the synthesis 
of messenger RNA by DNA-directed RNA polymerase. 
The DNA preceding the point of insertion on the plasmid 
must also have sequences necessary for the active trans- 
lation of the messenger RNA into protein. 

The DNA inserted into the restriction site on the 
expression vector is often complementary DNA or 
genomic DNA that has just been used for sequencing, 
and that DNA is cut out of the bacteriophage or plasmid 
in which it was screened and amplified. Occasionally, 
the inserted complementary DNA is from an organism 
the codon usage of which is so different from that of 
E. coli that poor expression occurs because of this mis- 
match. One solution to this problem is to synthesize the 
complementary DNA with compatible codons.'’’ The 
insertion into the expression vector is accomplished 
most effectively if the fragment has sticky ends that are 
compatible with the restriction site on the expression 
vector. One way this is accomplished is to use primers 
for the polymerase chain reaction that contain the 
sequences of DNA necessary to anneal to complemen- 
tary sequences of the DNA at the beginning and end of 
the coding sequence but in addition contain sequences 
of DNA for the appropriate endonucleolytic cleavage 
sites®!®"!9 and even sequences necessary for transla- 
tion.” The final DNA produced in the polymerase 
chain reaction will incorporate these additional 
sequences even though they did not exist in the initial 
DNA used as the template. If the complementary DNA 
encodes a segment of amino acid sequence that is nor- 
mally removed from the native protein by a posttransla- 
tional process absent from E. coli, the portion of the 
DNA encoding that segment sometimes has to be 
removed before a fully functional protein can be 
expressed. 7 

A piece of DNA encoding another amino acid 
sequence is often inserted ahead of the DNA encoding 
the protein of interest. For example, a portion of DNA 
encoding a strong promoter as well as a short segment of 
the protein that promoter usually controls, such as a seg- 
ment of B-galactosidase or the AcII protein, can be placed 
in front of the DNA to be expressed to guarantee that it is 
produced efficiently. In this instance, a stop codon fol- 
lowed by a start codon can be inserted between the two 
coding regions so that the fragment of DNA promoting 
transcription is not translated attached to the protein 
being expressed. It has been found in many instances, 
however, that fusion proteins, proteins in which the 
protein of interest is coupled during translation to 
another complete protein such as glutathione trans- 
ferase, B-galactosidase, or ubiquitin, are expressed in 
much higher yield than the unfused, intact protein of 
interest. Often this is due to the fact that the fusion pro- 
tein resists the endopeptidases of the E coli*®!” that 
would otherwise degrade the protein of interest. To iso- 


late the protein of interest without the associated fusion 
protein, an amino acid sequence is often introduced 
between the two proteins that is a target for an endopep- 
tidase of stringent specificity, such as activated factor Xa 
or renin, so that the unwanted portion can be removed 
by cleavage with that endopeptidase. 

A fusion protein can also be one between the pro- 
tein of interest and a portion of a protein such as an 
enterotoxin that contains a signal for secretion from 
E. coli. In this case, the protein produced ends up in the 
medium rather than in the cells. In one instance, how- 
ever, expression of a protein that is normally excreted 
from E. coli was toxic to the cells at the levels produced, 
and the sequences signalling excretion had to be 
removed to keep the protein inside the cells 17 

One problem with expression of a foreign protein 
in E. coli is its precipitation to form large inclusion 
bodies. In this precipitated form, the protein being 
expressed is inactive and indistinguishable from any 
other precipitated protein. It is often possible, however, 
to dissolve these precipitates in a solution of a salting-in 
solute such as urea or guanidinium chloride and rena- 
ture functionally active, fully soluble protein from this 
solution. 

Proteins can be expressed in cells other than those 
of E. coli. Expression plasmids containing promoters 
active in Saccharomyces cerevisiae’ that can be incor- 
porated into cells of this species of yeast are available 
for expressing proteins.’*° One of the difficulties of 
expressing animal proteins in bacteria or fungi is that 
these cells are unable to perform normal posttransla- 
tional modifications. An animal protein that is normally 
modified posttranslationally is usually expressed in 
animal cells capable of such modifications. One such 
animal system that provides high yields of protein is 
cells of the insect Spodoptera frugiperda. These insect 
cells, grown in culture, can be infected with virions con- 
taining an expression vector constructed from viral 
DNA of the nuclear polyhedrosis virus Autographa 
californica'” just as a culture of E coli can be infected 
with bacteriophage A. If the DNA encoding the protein 
of interest is inserted at a point in the viral genome 
under the control of the promoter for the viral coat pro- 
tein, high yields of the expressed protein are produced. 
Even higher yields can be produced if larvae (caterpil- 
lars) of Trichoplusia ni are infected with such a virus. !7° 
These insect expression systems produce proteins with 
many of the normal posttranslational modifications of 
animals 17 

To ensure that posttranslational modifications of 
mammalian proteins that are foreign to insects are cor- 
rectly made or to express a mammalian protein in the 
biological context of a mammalian cell, proteins are 
often expressed in cultured mammalian cells by use of 
an expression vector carrying a promoter from an animal 
virus, such as cytomegalovirus or simian virus. Such 
expression vectors can be inserted into the genomic DNA 
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of an animal cell such as Chinese hamster ovary cells or 
murine L cells by transfection. 

When a protein is expressed in any of these expres- 
sion systems, the final product of the expression is usu- 
ally a pellet of cells that is then homogenized, producing 
a complex mixture of proteins. Even if the expression 
has been so successful that the protein that has been 
expressed accounts for the majority of the protein in this 
mixture, it still must be purified. This purification is 
usually performed by the standard procedures because 
they are simple to implement, but it is possible to design 
the expressed protein to ease its purification. For exam- 
ple, the protein can be expressed with a string of six his- 
tidines attached at its carboxy terminus or amino 
terminus. An affinity adsorbent to which NI" has been 
attached through a covalently bound iminodiacetic 
acid!” binds such histidine tails with high specificity, 
and the expressed protein can be eluted, often in pure 
form, with a gradient of imidazole.!*! It is also possible 
to purify expressed proteins specifically if they have 
been designed to contain a short amino acid sequence 
on one of their termini recognized by a specific 
immunoglobulin immobilized on a solid phase. Fusion 
proteins between the protein of interest and glu- 
tathione transferase can be purified by using an affinity 
adsorbent on which glutathione has been covalently 
attached and eluting with glutathione. All of these 
strategies require that a short sequence of amino acids 
or even another protein be fused with the protein of 
interest, but if a short sequence recognized by a strin- 
gent endopeptidase is incorporated between the two, 
the protein of interest can be released in its unmodified 
form by digestion. 

One advantage of expressing a protein in a system 
in which it is produced as a major fraction of the cellular 
protein or it has been tagged for affinity adsorption is 
that its purification often requires fewer steps than 
purification from its natural source. Because the steps of 
a purification are often accompanied by slow degrada- 
tion of the protein, the fewer the steps, the more homo- 
geneous will be the final purified protein. Crystals are 
more readily obtained from a protein the purification of 
which has been simple and rapid. For this reason, if they 
are available in high yield, expressed proteins are usually 
used in crystallographic studies in preference to the 
same proteins purified from natural sources. Often, how- 
ever, expressing a protein in cells, even in E. coli, pro- 
vides far less of the purified protein than can be obtained 
by starting with 10 kg of liver, heart, blood, or skeletal 
muscle. In such instances, if all that is desired is the pure 
protein, using an expression system is inefficient and 
costly. If, however, one experimental goal is to mutate 
specific amino acids in the sequence of the protein, an 
expression system is unavoidable. 

Site-directed mutation”? converts one particu- 
lar amino acid in the sequence of a polypeptide into 
another of the 20 amino acids. It is also possible to delete 


amino acids from the sequence of a polypeptide or insert 
extra amino acids at a particular location with this tech- 
nique. The method requires that the complementary 
DNA or genomic DNA for the protein of interest has been 
cloned and that the encoded protein can be expressed, in 
quantities sufficient for the contemplated experiments. 
The site-directed mutation is incorporated into the DNA, 
and the mutated DNA is used to direct the production of 
the modified polypeptide in which one particular amino 
acid has been deliberately changed. For example, a col- 
lection of 13 mutated versions of the lysozyme from T4 
bacteriophage, in which Threonine 157 had been 
changed to 13 of the other 19 amino acids, was produced 
by site-directed mutation. Each of these 13 different pro- 
teins was obtained as a pure crystalline product in quan- 
tities sufficient for crystallographic analysis.'** 

A site-directed mutation can be introduced into a 
particular segment of DNA by annealing a short piece of 
synthetic DNA, the mutagenic oligonucleotide, to one 
of the two strands of the unmutated DNA to form a 
short section of double-helical DNA in which one or 
more of the nucleotide bases are mismatched.’ The 
mutagenic oligonucleotide is designed so that the 
desired mismatches occur in the middle of the duplex 
formed by the annealing and there are sufficiently long 
regions of complementary nucleotide sequence on each 
flank to guarantee that a stable and specific duplex is 
formed. The original way this was accomplished is the 
following. 

A restriction fragment of the DNA encoding the 
protein of interest and containing the site to be mutated 
is inserted into the genome of an M13 bacteriophage, a 
bacteriophage that carries its genome as single-stranded 
DNA. Infection of a suspension of E. coli with the altered 
bacteriophage produces virus particles containing the 
enlarged genome on a closed, single-stranded circle of 
DNA? Closed, single-stranded circles containing the 
strand of the inserted DNA complementary to the muta- 
genic oligonucleotide are selected! for hybridization. 
The mutagenic oligonucleotide is complementary to 
sequences on this single-stranded DNA except at the 
central, mismatched positions, chosen to produce the 
desired change in a particular codon. For example, the 
deoxyribonucleotide sequence —CTCTACTGCGGGTT- 
TG- occurs in DNA encoding the sequence of tyrosyl- 
tRNA synthetase from Bacillus stearothermophilus. It 
encodes the amino acid sequence -LYCGF-,, which con- 
tains amino acids 33-37 in the sequence of the intact 
protein. The mutagenic oligonucleotide -CAAACCCGC- 
CGTAGAG- was chemically synthesized.'”® It is comple- 
mentary to the coding sequence of the unmutated 
complementary DNA except at its tenth residue, which is 
a C instead of the complementary A. When it was 
annealed to a single-stranded, circular DNA containing 
DNA with the unmutated sequence, it formed a short 
self-complementary segment of double-stranded DNA in 
which its C was mismatched with the T of the unmutated 
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sequence. It was this mismatch that eventually produced 
the mutated DNA with the sequence -CTCTACG- 
GCGGGTTTG-, encoding the mutated protein sequence, 
-LYGGF-. 

The short mutagenic oligonucleotide sits upon the 
single-stranded, circular M13 DNA as a primer offering a 
free 3’-hydroxyl group. This hydroxyl is used to initiate 
the synthesis of DNA by DNA-directed DNA poly- 
merase.” The enzyme synthesizes a single strand of 
DNA upon the circular template until it comes around 
the circle to the 5’-end of the mutagenic oligonucleotide 
where it stops. The newly synthesized, single-stranded 
circle is then closed with DNA ligase to produce a closed, 
double-stranded circle of DNA, completely complemen- 
tary except at the designed mismatch. This double- 
stranded circular DNA is then replicated in a suspension 
of E. coli. Half of the resulting viral DNA should contain 
the mutated sequence of the segment of the inserted 
DNA because it is the progeny of the single strand into 
which the mutagenic oligonucleotide was incorporated 
originally. 

Plaques produced by the viruses are screened to 
locate ones producing the mutated DNA,” double- 
stranded DNA is produced from one of these mutants 
and amplified, and the desired restriction fragment con- 
taining the mutation is isolated and reintroduced into 
the original DNA to create full-length DNA incorporating 
the mutation. The mutant protein expressed from this 
full-length, mutant DNA should contain the designated 
substitution. For example, in the case of the mutated 
tyrosyl-tRNA synthetase, it was shown by direct sequenc- 
ing of the purified protein that it had a glycine rather than 
a cysteine at position 35.'°° That the modification has 
occurred, however, is usually verified by sequencing the 
mutated DNA rather than the protein itself. 

Several improvements in the original method for 
site-directed mutation just described have been made. 
The most important is the adaptation of the procedure so 
that double-stranded plasmids, rather than single- 
stranded M13 DNA, can be mutated directly.'” Another 
improvement has been the development of strategies 
permitting the removal of the parental unmutated 
strands of DNA that served as the template for the muta- 
tion so that all of the newly synthesized DNA carries the 
mutated sequence,'**"! increasing the percentage of the 
product that bears the mutation. A related method that 
also selects for DNA bearing the mutation is to use two 
primers, one that mutates the position of interest and the 
other that mutates a unique restriction site on the plas- 
mid outside of the DNA inserted into it. In this way only 
the DNA containing the desired mutation, which also has 
the mutated restriction site, is immune to cleavage at the 
restriction site.“ Finally, the PCR method has been 
applied to produce mutated DNA TI Because of its 
importance, many different procedures are now avail- 
able for site-directed mutation, and each investigator 
believes that the one she is using is the best. 


Site-directed mutations can also be produced by 
insertion of cassettes of synthetic double-stranded 
DNA into a particular complementary DNA. In this 
method, preexisting or purposely designed restriction 
sites for site-specific deoxyribonucleases that flank the 
region to be mutated are chosen. These restriction sites 
are designed or chosen so that the piece of double- 
stranded DNA produced by the site-specific deoxyri- 
bonucleases is short and has single-stranded, sticky 
ends, such as those produced by PstI (CTGCAIG). A 
double-stranded segment of DNA is synthesized so that 
it has the appropriate sticky ends and incorporates 
complementary nucleotide sequences that encode the 
desired mutation. This is the cassette, which is then 
inserted into the hole in the original complementary 
DNA produced by the site-specific deoxyribonucleases. 
The advantage of the cassette is that the mutation is 
produced directly by insertion of synthetic double- 
stranded DNA. The disadvantage is that two comple- 
mentary pieces of synthetic single-stranded DNA have 
to be synthesized. Nevertheless, mutation with cas- 
settes has particular advantages when sets of mutants 
are prepared in which all of the possible 19 substitu- 
tions need to be made at a particular location.'“° A sim- 
ilar but much more ambitious strategy is to synthesize 
fragments of DNA that when ligated together constitute 
the entire coding sequence for a protein. In this way a 
mutation can be introduced anywhere by synthesizing 
the corresponding fragment that has the altered 
sequence at the position to be mutated and ligating it 
with the remaining unmutated fragments." 

One of the supposed drawbacks of site-directed 
mutation is that only the 19 other natural o amino acids 
are available for substitution at the mutated site. It is 
rather easy to synthesize an o amino acid. A large 
number are available commercially and if one that has 
been drawn on a piece of paper is not available com- 
mercially, it can usually be synthesized. It is now possi- 
ble to replace an amino acid at any position in a 
polypeptide with any one of these unnatural amino 
acids. To do this, advantage is taken of the fact that 
there are three stop codons for translation: UAA (ochre), 
UAG (amber), and UGA. A rare tRNA, the amber sup- 
pressor tRNA, reads the codon UAG and normally 
inserts phenylalanine at that position. The triplet encod- 
ing the chosen amino acid in the coding sequence of the 
protein is mutated by usual site-directed mutation to 
TAG, and an amber suppressor tRNA (tRNAcya) to which 
the unnatural amino acid to be inserted has been syn- 
thetically attached is used to effect the desired substitu- 
tion in a cell-free system for transcription and 
translation. “®® The requirements for chemically syn- 
thesizing the derivative of the suppressor tRNA and the 
low yields of protein from the cell-free translation 
system have limited the application of these procedures, 
but in at least one instance protein sufficient for crystal- 
lographic studies has been prepared.” 
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Problem 3-7: Draw the complete structures of 2’-deoxy- 
adenosine 5’-triphosphate (dATP), 2’-deoxyguanosine 
5’-triphosphate (dGTP), thymidine 5’-triphosphate 
(dTTP), 2’-deoxycytidine 5’-triphosphate (dCTP), and the 
single-stranded deoxyribonucleic acid AGTC. 


Problem 3-8: Write out, in the three-letter abbrevia- 
tions for the amino acids, the amino acid sequences of 
the amino terminus and the internal tryptic peptide of 
extensin from V. carteri that guided the synthesis of the 
two primers used to make the probe by the polymerase 
chain reaction. Look up the genetic code. Below these 
two amino acid sequences, write out the nucleic acid 
sequences of the 5’- and 3’-ends of the sense strand of the 
segment of 410 bp amplified by the polymerase chain 
reaction aligned by the genetic code with the respective 
sequences of the amino acids. Below these two 
sequences of nucleic acid, write out their complemen- 
tary sequences. Below each of the appropriate positions 
in these two blocks of aligned sequences, write out all of 
the redundant codons that the probe was designed to 
include. How many polynucleotides of different 
sequence resulted from each of the two syntheses? 


Problem 3-9: A fragment of single-stranded RNA, 488 
nucleotides long, was obtained from one of the riboso- 
mal RNAs of rat liver, 28S rRNA, by treatment with 
a-sarcin.'*! It was treated with alkaline phosphatase to 
remove any phosphate from its 5’-end and then with 
[y-*P]MgATP and T4 polynucleotide 5’-hydroxyl-kinase 
to attach a radioactive phosphate to its 5’-end. The 
sample was then split into five separate portions. They 
were treated with the following reagents, respectively: 


() NaOH 


(AD ribonuclease T,, which cleaves on the 3’-side 
of G 


(II) ribonuclease U,, which cleaves on the 3’-side 
of A 


(IV) ribonuclease PhyM, which cleaves on the 3’-sides 
of Aand U 


(V) ribonuclease BC, which cleaves on the 3’-sides of 
U and C 


The alkaline hydrolysis (I) and the enzymatic digestions 
(I-V) were carefully controlled so that only a small 
amount of cleavage occurred at each sensitive position. 
The five mixtures were then placed in adjacent lanes on 
a polyacrylamide gel and submitted to electrophoresis 
followed by autoradiography. A tracing of that autoradi- 
ogram is presented below. An autoradiogram only regis- 
ters radioactive fragments. 
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Each lane is labeled with the appropriate roman 
numeral. The most rapidly migrating bands were 
mononucleotides. 


(A) Starting with the nucleotide on the 5’-end, write 
the sequence of the a-sarcin fragment covered by 
the gel. Indicate clearly 5’ — 3’ polarity. 


(B) Look carefully at the gel and then give a reason for 
including the digest in lane I. 


Problem 3-10: A piece of double-stranded DNA about 
4360 base pairs in length was produced by the site-spe- 
cific deoxyribonuclease Sall. When this was digested 
with the site-specific deoxyribonucleases Ddel and 
Pvull, the fragments described in the diagram below 
were obtained. The numbers are the approximate 
lengths of the fragments. 


Sall fragment 


Ddel 820 Pvull a 
70 


1170 
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Construct a restriction map. 


Problem 3-11: A piece of double-stranded DNA about 
5300 base pairs in length has been produced by the 
action of the site-specific deoxyribonuclease EcoRI. 
When this fragment was digested with the site-specific 
deoxyribonucleases Hindlll, KpnI, and BamHI, the fol- 
lowing results were obtained. The numbers are the 
approximate lengths of the fragments. 
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Hindill „ 1100 
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Hindlll 
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Construct a restriction map. 


Posttranslational Modification 


With the exception of the evanescent N°-formyl group on 
its amino terminus and perhaps the 21st primary amino 
acid, selenocysteine,’ the infant polypeptide as it 
emerges from the peptidyltransferase site on the ribo- 
some is a polymer containing only the 20 natural amino 
acids. Each amino acid is coupled to its neighbors by the 
amides of the peptide backbone, and the amino acids are 
arranged in the sequence encoded by the particular mes- 
senger RNA. It is this covalent structure and only this 
covalent structure that can be read by the investigator 
from the sequence of the messenger RNA or genomic 
DNA. The covalent structures of many proteins, however, 
do not remain in this untouched state but are biologically 
modified. A posttranslational modification is any 
change in the covalent structure of a polypeptide that 
occurs after its emergence from the ribosome. 
Although a thiopeptide bond 
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has been observed at Glycine 445 of coenzyme-B sul- 
foethylthiotransferase,’*** most posttranslational mod- 
ifications of the polypeptide backbone result from 
endopeptidolytic cleavage or covalent rearrangements. 
Modifications of the original covalent structure of 
the polypeptide are performed naturally by cellular 
endopeptidases. Such normal editing of the amino acid 
sequence of the protein must be distinguished from arti- 
factual degradation by endopeptidases that can occur, 
for example, during the purification of a protein. In the 
course of a normal, natural modification, the polypep- 
tide of a particular protein is cleaved internally, either as 
a mechanism for controlling its enzymatic activity or for 
architectural purposes. An example of the former is the 
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activation of endopeptidases in the pancreatic secretions 
or the serum by internal cleavages by endopeptidases.” 
An example of the latter is the trimming of folded proin- 
sulin to produce insulin. As in the production of insulin 
from proinsulin, a number of other hormones are pro- 
duced by endopeptidic cleavage at -Lys-Lys- or -Arg- 
Lys- positions in the sequence of longer precursors.'”° 
For example, corticotrophin, ß-lipotropin, y-lipotropin, 
B-endorphin, a-melanocyte-stimulating hormone, and 
y-melanocyte-stimulating hormone are all cut from the 
same precursor 265 aa in length.'?”'°® Following the ini- 
tial endopeptidolytic, posttranslational cleavage, the 
new amino terminus and carboxy terminus can be fur- 
ther digested by exopeptidases.'!! 

Almost all of the proteins of animals are posttrans- 
lationally shortened by the removal of one or more of the 
amino acids from their amino terminus, but some pro- 
teins have particular segments removed from their 
amino termini as they are passed from one compartment 
in the cell to another compartment. These amino-termi- 
nal signal sequences’ address the proteins to the 
proper locations, and their removal is presumably 
involved in keeping them there. These successive 
removals of portions of the amino-terminal sequence 
have led to the terms pre-proprotein and proprotein. 

There is a set of posttranslational modifications 
involving cysteines, serines, threonines, asparagines, 
and aspartates that result in rearrangements in the 
covalent structure of the polyamide backbone of a pro- 
tein or self cleavage of its backbone. These five amino 
acids promote these modifications because they place 
either a nucleophile or an electrophile four atoms away 
from either an electrophilic acyl carbon or a nucleophilic 
amide nitrogen, respectively. Thus, the chemistry 
involved is the chemistry of five-membered heterocycles. 
Almost all of these posttranslational modifications are 
catalyzed intramolecularly by the protein itself. 

Cysteines, serines, and threonines have their 
nucleophilic oxygens or sulfurs four atoms away from the 
acyl carbon of their amino-terminal neighbor. One 
example of a consequence of this spacing is the post- 
translational modifications that produce thiazolines 


and oxazolines in microsin.'®'*' Another is the self-cat- 
alyzed posttranslational modification that cleaves the 
polypeptides of human S-adenosylmethionine decar- 
boxylase!® between Glutamine 67 and Serine 68’, histi- 
dine decarboxylase from Lactobacillus’? between 
Serine 81 and Serine 82, and aspartate 1-decarboxylase 
from E coli! between Glycine 24 and Serine 25, in each 
case producing a pyruvated amino terminus. 
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The first step in this reaction is a five-membered tetrava- 
lent intermediate as in the first step of Equation 3-8, but 
the amine leaves the intermediate rather than water to 
produce the intermediate ester, which has been 
observed crystallographically.’”"'® This step is an exam- 
ple of an NO acyl migration. The next step in this 
reaction utilizes the superior ability of carboxylate as a 
leaving group to effect the dehydration ultimately pro- 
ducing the pyruvyl group. The oxygen of the original 
serine ends up in the carboxylate of the new carboxy 
terminus produced in the reaction.'® An «-ketobutyryl 
group'®” is found as an acyl substituent at the amino 
terminus of one of the two polypeptides composing 
threonine dehydratase, and it presumably arises by a 
similar mechanism (Equation 3-9) from a threonine in 
the protein rather than a serine. 

An N—O acyl migration also occurs in the self- 
catalyzed’ cleavage of the peptide bond on the 
amino-terminal side of Threonine 206 in human 
N*-(B-N-acetylglucosaminyl)-L-asparaginase.' In this 
case, the ester produced by the migration (Equation 3-9), 
rather than providing a leaving group, hydrolyzes to 


produce the break in the polypeptide between amino 
acids 205 and 206. In hedgehog protein from D 
melanogaster, the thioester resulting from an N>S 
migration of the polypeptide at Cysteine 258 is transes- 
terified onto the hydroxyl group of cholesterol, which 
takes the place of the water that would hydrolyze the 
ester.” In the process, the polypeptide is cleaved 
between Glycine 257 and Cysteine 258, and the 
cholesterol ends up as a posttranslational modification 
esterified to the new carboxy terminus. 

An asparagine or aspartic acid places an elec- 
trophile four atoms away from the amide nitrogen of its 
carboxy-terminal neighbor. This can lead to the produc- 
tion of an aspartyl imide, an isoaspartyl peptide bond, or 
an aspartate where there was an asparagine.!”1"” 


aspartyl imide 
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isoaspartyl peptide bond 
(3-10) 


It is the production of aspartyl imides between 
asparagines and adjacent glycines (R = H), promoted by 
alkaline pH and elevated temperature, that is thought to 
be the first step in the chemical cleavage of a polypeptide 
by hydroxylamine.” The preference, in this case, for 
asparaginylglycyl peptide bonds is thought to be due to a 
steric effect on the initial cyclization that is minimized 
when the amido nitrogen that attacks the acyl carbon of 
the side chain of the asparagine is that of a glycine. It is 
thought that the aspartyl imide is then cleaved by the 
nucleophilic hydroxylamine to produce the chemical 
cleavage used experimentally to produce large fragments 
of a polypeptide.*® 

The formation of aspartyl imides, aspartates, and 
isoaspartyl peptide bonds also seems to occur sponta- 
neously but slowly at many of the asparagines, as well as 
at aspartic acids,’ in most proteins in their natural envi- 
ronment to produce a low level of isoaspartyl peptide 
bonds,’ which are more stable than normal aspartyl or 
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asparaginyl peptide bonds.'” Both an aspartylimide and 
an isoaspartyl peptide bond have been observed crystal- 
lographically at the position of Aspartate 101 in hen 
lysozyme, which precedes Glycine 102,'”* and an isoas- 
partyl peptide bond has been observed crystallographi- 
cally at the position of Asparagine 67 in bovine 
pancreatic ribonuclease, which precedes Glycine 68." 
Because the hydrolysis of an aspartyl imide can lead to 
the replacement of an asparagine with an aspartate still 
in a normal peptide bond, this reaction may be responsi- 
ble for the deamidation observed at particular sites in 
some proteins.'’%17”” Because the aspartylimide racem- 
izes more rapidly at its œ carbon than does either of the 
amides,” this process also introduces D-aspartates into 
the polypeptide. 

Both the unnatural isoaspartyl peptide bonds and 
the D-aspartates are recognized by a repair enzyme that 
methylates their free carboxylates. This methylation 
reinitiates the formation of the aspartyl imide, which can 
spontaneously racemize and hydrolyze to produce 
L-aspartate in a normal peptide bond, thus repairing the 
problem.'”*'*! Only a fraction of the imide racemizes 
before it hydrolyzes, and when it hydrolyzes the isoas- 
partyl peptide bond is the favored product, but if only the 
D-aspartates and the isoaspartates are methylated and if 
they are recycled often enough, significant repair can be 
accomplished. 7719 

The ultimate exploitation of this class of posttrans- 
lational modifications involving five-membered rings is 
self-catalyzed rearrangements of an amino acid 
sequence. There is a group of proteins, such as vacuolar 
adenosinetriphosphatase,'®*'?* certain RecA proteins,'® 
and certain DNA polymerases” that contain an internal 
sequence called an intein that is 400-550 aa in length. 
The intein is spliced out of these proteins coincident 
with the formation of a peptide bond connecting the car- 
boxy terminus of the amino-terminal segment of the pro- 
tein that precedes the intein to the amino terminus of the 
carboxy-terminal segment following it. 


*H3N—— (Cys,Ser,Thr)inteinHisAsn(Thr,Ser,Cys) ——COO — 
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(Thr,Ser,Cys) 


(Cys,Ser,Thr)inteinHis 


(3-11) 
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The intein can be one continuous folded polypep- 
tide connecting the amino-terminal segment preceding 
it to the carboxy-terminal segment following it, or it can 
be the carboxy-terminal segment of one folded polypep- 
tide that is bound noncovalently to the amino-terminal 
segment of a second folded polypeptide.'”'#® In the 
latter instance, the intein, after it has been spliced out, is 
two folded polypeptides bound to each other but the 
other product is still one continuous, spliced polypeptide 
formed from the amino-terminal segment of the first 
folded polypeptide and the carboxy-terminal segment of 
the second. The intein always begins with a serine, thre- 
onine, or cysteine and ends with a histidinyl asparagine, 
and the carboxy-terminal segment always begins with a 
serine, threonine, or cysteine. 7 

An even more extensive set of similar self-catalyzed 
posttranslational rearrangements occurs in con- 
canavalin A from Canavalia ensiformis. After the initial 
polypeptide is produced by the ribosome, the a-amido 
group of Serine 30 couples to the «-acyl group of 
Asparagine 281 in place of the a-amido group of 
Glutamate 282, releasing the amino-terminal 29 amino 
acids preceding Serine 30 and the carboxy-terminal nine 
amino acids following Asparagine 281 as two short pep- 
tides, and the polypeptide is cleaved to the carboxy-ter- 
minal sides of Asparagines 148 and 163, releasing the 
intervening 26 amino acids as another short peptide.'” 
The final intact product of the splicing begins at Alanine 
164 of the precursor and ends at Asparagine 148. 

The two spontaneous cleavages to the carboxy- 
terminal sides of Asparagines 148 and 163 in concanava- 
lin A are thought to result from an attack of the amide 
nitrogen of the asparagine on its own acyl carbon 
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in a reaction analogous to that involving the acyl oxygen 
of aspartate under acidic conditions (Figure 3-5). In the 
case of concanavalin A, this cleavage would be catalyzed 
by other amino acids in the protein and the product 
would be the imide. 

The first step in intein splicing is an N — O or N — S 


acyl migration (Equation 3-9) of the amino-terminal seg- 
ment of the protein occurring at the serine, threonine, or 
cysteine on the amino-terminal side of the intein 
(Equation 3-11).!9!! The amino-terminal segment is 
then passed by a transesterification to the oxygen or 
sulfur of the serine, threonine, or cysteine on the 
carboxy-terminal side of the upstream splice site 
(Equation 3-11) to form a branched intermediate in 
which the intein and the carboxy-terminal segment are 
still joined together and the amino-terminal segment is 
esterified to the serine, threonine, or cysteine.’ In the 
next step of the reaction, the peptide bond to the car- 
boxy-terminal side of the asparagine is cleaved (Equation 
3-12) to produce the free intein with an unsubstituted 
aspartyl imide at its carboxy terminus.’ The peptide 
bond between the amino-terminal segment and the 
carboxy-terminal segment is then formed by the respec- 
tive O — N or S — N acyl migration. The amino-terminal 
and carboxy-terminal splice sites sit next to each other in 
the folded protein to permit all of these rearrangements 
to occur in close proximity.!%%195196 

In the rearrangement of concanavalin A, Glutamate 
282 in the asparaginylglutamate is replaced by Serine 30 
to produce an asparaginylserine. The first step in this 
reaction is probably the cleavage of the peptide bond of 
the asparaginylglutamate at positions 281 and 282 to 
produce the aspartyl imide at the resulting carboxy 
terminus (Equation 3-12). The following steps in the 
reaction would then be, by analogy to those of intein 
splicing, N—O migration at Serine 30, attack of the 
a-amino group of Serine 30 on the aspartyl imide of 
Asparagine 281, and hydrolysis of the ester between the 
a-carboxyl group of Serine 29 and the hydroxyl group on 
the side chain of Serine 30. 

The posttranslational modifications of the back- 
bone of the initially synthesized polypeptide that are 
produced by endopeptidolytic cleavage, self-catalyzed 
cleavage, the formation of aspartyl imides or isoaspartyl 
peptide bonds, or intein splicing have usually been iden- 
tified by electrophoresis of complexes between the 
polypeptide and dodecyl sulfate, a procedure that regis- 
ters the lengths of constituent polypeptides; by electro- 
spray mass spectrometry, a procedure that registers 
decreases in mass caused by loss of portions of the 
polypeptide; by amino-terminal sequencing, a proce- 
dure that registers newly formed amino termini; by 
digestion with carboxypeptidases, a procedure that iden- 
tifies new carboxy-terminal sequences; and by digestion 
with exopeptidases, which digest the normal peptide 
bonds but not imides or isopeptide bonds. These analy- 
ses rely heavily on the complete amino acid sequence of 
the unmodified polypeptide that is the immediate prod- 
uct of protein synthesis. This sequence is learned from 
sequencing the nucleic acid encoding it. For example, 
even though the complete amino acid sequence of 
mature concanavalin A was known from direct sequenc- 
ing,!”” the extensive rearrangements producing the final 


protein went unrecognized until the complementary 
DNA encoding it had been sequenced.!*°' 

The amino terminus of a polypeptide can be 
N-methylated, ™?® N-2-pyruvylated,”” or N-acylated, 
either intramolecularly, as in pyroglutamate (Figure 
3-16), or externally, as when it is N-formylated,?” 
N-acetylated,”” or N-glucuronylated.””® Enzymes are 
available that hydrolyze pyroglutamyl groupe" or 
remove acetyl groups.” In a murein lipoprotein from 
bacterial outer membrane*” and ubiquinol oxidase 
(cytochrome bo;) from E coli,” each of the respective 
amino-terminal cysteines is N-acylated by a fatty acid at 
its @-amino group and its sulfur forms a thioether with 
carbon 3 of a 1,2-diacylglycerol. n-Tetradecanoyl amides 
of amino termini (Figure 3-16)*”° were first found on pro- 
tein kinases. The existence of these fatty acylated amino 
termini was established by isolating an amino-terminal 
peptide, CH3(CH2);,COHNGly-Asn-Ala, from cAMP- 
dependent protein kinase and confirming its structure 
by chemical degradation and by mass spectrometry with 
fast-atom bombardment.” By similar procedures it was 
shown that recoverin was acylated at its amino terminus 
with a mixture of n-dodecanoic acid, cis-n-tetradec- 
5-enoic acid, and cis,cis-n-tetradeca-5,8-dienoic acid in 
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Figure 3-16: Posttranslational modifications that occur at the ter- 
mini of a polypeptide. 
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addition to n-tetradecanoic acid. Such chemical 
demonstrations ofa modification at the amino terminus 
ofa polypeptide should be distinguished from an unsup- 
ported conjecture that the amino terminus is blocked 
when the Edman degradation fails. 

The carboxy terminus of a polypeptide can also be 
modified, for example, as the primary amide (Figure 
3-16), the tyrosyl amide,” or the methyl ester 7 In at 
least one instance,” the primary amide at a carboxy 
terminus is produced from a carboxy-terminal glycine 
that is first monooxygenated and then decomposes with 
the loss of glyoxylate to leave behind its former amino 
group as the carboxy-terminal amide. 


(3-13) 


During the posttranslational modification of sev- 
eral polypeptides with the carboxy-terminal sequence 
CXYZ (where X, Y, and Z each represent one of many pos- 
sible amino acids),*!° the last three amino acids are 
removed,”!° and the cysteine at the new carboxy termi- 
nus is doubly modified (Figure 3-16) by isoprenylation 
and methylation. A farnesyl group?’ or a geranylgeranyl 
group” 19 is added to the sulfur of the cysteine in an 
allylthioether, and the new carboxy terminus is methy- 
lated to form the methyl ester.” It is thought that 
these modifications make the carboxy terminus suffi- 
ciently hydrophobic to bind to biological mem- 
branes.”'*”* There are also proteins in which the 
polypeptide synthesized from the messenger RNA has 
the carboxy-terminal sequence Cys-Cys or Cys-X-Cys, 
and each of the cysteines in these carboxy-terminal 
sequences is then geranylgeranylated.”* The geranyl- 
geranylated proteins with the carboxy-terminal 
sequence Cys-X-Cys are then methylated on their termi- 
nal carboxylates,” but those with the carboxy-termi- 
nal sequence Cys-Cys are not.””° 

An extensive posttranslational modification of the 
carboxy terminus occurs in certain proteins that are bound 
tightly to the extracellular surface of protozoal and animal 
cells.“ ** It has the effect of covalently connecting the 
carboxy terminus of the protein to a phosphatidylinositol 
dissolved within the bilayer of the plasma membrane. The 
carboxy terminus is linked directly through an amide to 
an ethanolamine, which is linked through a phosphate 
diester to the mannose of an oligosaccharide, which, in 
turn, is linked by a glycosidic linkage to phosphatidyl- 
inositol, a phospholipid (Figure 3-17).”*°** Because this 
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Figure 3-17: Structure of the linkage between phosphatidylinositol and the carboxy terminus of a polypeptide in a phosphatidylinositol- 
linked protein.”"** The carboxy-terminal amino acid sequence shown is that for the linkage at the end of the variant surface glycoprotein 
MITat.1.4 from Trypanosoma brucei.” The phosphatidylinositol shown is in the ditetradecanoyl form, but saturated and unsaturated fatty 
acids from 12 to 22 carbons in length can be esterified at either position in place of either or both of the tetradecanoyl groups. A variant in 
which a tetradecanoyl group is also attached to the inositol has been reported,” as well as one in which a ceramide replaces the dia- 
cylglycerol.””° The phosphatidylinositol is coupled to an unacetylated p-glucosamine, which is coupled in turn through a trisaccharide of 
D-mannoses to ethanolamine phosphate, the primary amine of which is attached in amide linkage to the carboxy terminus of the protein. A 
variant of the more common structure displayed here has the phosphoethanolamine attached through the 3-position of the middle mannose 
rather than the 6-position of the end mannose.” Within the tetrasaccharide, the position marked (Gal), is either a hydrogen or an oligosac- 
charide of one or more galactosyl groups; the position marked Man is either a hydrogen, an a@1-mannosyl group, or a mannosyl disaccha- 
ride; the position marked (Gal X) is either a hydrogen, a ßl-galactosyl group, or a ßl-N-acetylgalactosamyl group; and the position marked E 


is either a hydrogen or an (O-ethanolamino)phosphoryl group. 


posttranslational modification causes the protein to 
adhere to membranes, it is called a glycosylphos- 
phatidylinositol (GPT) anchor. 

In addition to the polypeptide itself and its amino 
and carboxy termini, posttranslational modification of 
the side chain of an amino acid can occur. When it does, 
the derivative remains an L-a-amino acid residue 
because its carboxyl group and its a-amino group are 
protected by the amides of the backbone. There are 
many posttranslationally modified amino acids that have 
been identified in naturally occurring polypeptides 
(Table 3-1, Figure 3-18). 

The length of Table 3-1 gives the erroneous 
impression that posttranslational modifications are 
common. Aside from glycosylation and the phosphory- 
lation of serines, threonines, and tyrosines, the inci- 
dence of any of these modifications is quite limited, 
often being confined to only one protein or one small 
family of proteins. For example, two of the earliest 
recognized posttranslational modifications were the 
5-hydroxylysine and 4-hydroxyproline (Figure 3-18) that, 
with few exceptions, are formed in the posttranslational 
monooxygenation of only prolines and lysines that are 
found in segments of amino acid sequence in which 
every third amino acid is a glycine.“ Such sequences 


occur in the various collagens and proteins related to the 
collagens. The modifications producing covalently 
bound coenzymes occur only in proteins using these 
coenzymes to assist in catalysis of particular reactions. 
Some of the other posttranslational modifications, for 
example, the quinones of 2,5-dihydroxytyrosine (Figure 
3-18), 6,7-dioxo-4-(2-tryptophanyl)tryptophan 
(Figure 3-18), and dehydroalanine,” occur only in the 
active sites of particular enzymes and are designed for 
specific functions. 4-Carboxyglutamate (Figure 
3—18)°°*53385.411 is found only in a few of the proteins 
that bind calcium strongly or that are involved in cal- 
cium metabolism.’!” Thyroxin,?°126 O-(3,5-diiodo- 
4-hydroxyphenyl)-3,5-diiodotyrosine (Figure 3-18), is 
found only in the protein thyroglobulin,” wherein it is 
formed at two positions by the intramolecular conden- 
sation of two pairs of 3,5-diiodotyrosines.’ The sole 
function of this large protein (2769 aa) is to produce the 
thyroxin, which is then liberated from the protein by its 
complete digestion. Diphthamide, 2-[3-carboxamido- 
3-(trimethylammonio)propyllhistidine (Figure 3-18), is 
found only in one of the elongation factors (elongation 
factor 2) involved in eukaryotic translation.’°”® The 
attachment of ADP-ribosyl groups to diphthamide, argi- 
nine, and asparagine side chains in one or the other ofa 


small group of proteins is catalyzed by bacterial toxins, 
and only proteins in individuals infected with these bac- 
teria are modified in this way. There also seem to be 
enzymes in normal cells, however, that are capable of 
ADP-ribosylating a small number of proteins as part of 
their normal operation. 417 

Mass spectrometry is often used to identify these 
posttranslational modifications on the side chains of 
amino acids. Electrospray mass spectrometry of a puri- 
fied, intact protein is often the first indication that it con- 
tains a posttranslational modification. Because the 
unmodified amino acid sequence of a protein as it is pro- 
duced by the ribosome is often known even before it has 
been purified but usually soon after, any difference 
between the molecular mass observed by electrospray 
mass spectrometry and the mass calculated from the 
unmodified amino acid sequence indicates a posttrans- 
lational modification. Such results were the first indica- 
tions that the protein Ner of bacteriophage Mu was 
modified at its amino terminus with a pyruvate” and 
that bovine recoverin was modified at its amino terminus 
by one of several different fatty acids.’'' Electrospray 
mass spectrometry of peptides purified from a digest of 
rat profilaggrin identified nine phosphopeptides by the 
fact that their molecular masses were 80 Da greater than 
those predicted from their amino acid sequences.“ 

Normal direct probe, high-resolution mass spec- 
trometry and mass spectrometry with electron ioniza- 
tion have been used to provide molecular ions and 
fragment ions of posttranslational modifications such as 
polyisoprenoids”'?””° or 5-mercaptouracil*” that can be 
removed chemically from the amino acid to which they 
are attached. Electrospray mass spectrometry in the neg- 
ative ion mode has been used in a similar way to identify 
the ceramide released from the GPI anchor of the 
arabinogalactan proteoglycan from Pyrus communis.” 
Fast-atom bombardment feeding a conventional mass 
spectrometer has been used to vaporize a bispeptide 
containing the semicarbazide derivative of 6,7-dioxo- 
4-(2-tryptophanyl)tryptophan and obtain a high-resolu- 
tion mass spectrum with a molecular ion of 940.3262 Da 
which was of sufficient precision to calculate a molecular 
formula for the modification.°*° 

Fast-atom bombardment or matrix-assisted-laser- 
desorption ionization feeding a tandem mass spectrom- 
eter can be used to vaporize posttranslationally modified 
peptides purified from a digest of a protein, sort the 
molecular ions in the first mass spectrometer, fragment 
those ions, and then separate the fragments in the 
second mass spectrometer. The resulting pattern of frag- 
ments is often sufficient to identify the posttranslational 
modification. Peptides containing an a-hydroxyg- 
lycine,** an 8a- (N°-histidyl) flavin mononucleotide,” 
and an N-acetyl-O-phosphothreonine”” were analyzed 
in this way. Matrix-assisted-laser-desorption ionization 
feeding a time-of-flight mass spectrometer in either the 
positive-ion mode” 5° or negative-ion mode" has 
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been used to identify posttranslational modifications in 
peptides that have been purified from digests of pro- 
teins. The negative-ion mode is used for phosphopep- 
tides.“ 

Electrospray can also be used to produce ions to 
feed a tandem mass spectrometer. ®™?™ Although some 
difficulty arises with the multiply charged ions emitted 
by the electrospray, they can usually be sorted out suc- 
cessfully in the first mass spectrometer because peptides 
are short enough that only a few ions are produced from 
each ofthem.!!! For example, a peptide containing cova- 
lently bound flavin from fructosyl-amino acid oxidase of 
Aspergillus was vaporized by electrospray ionization, the 
ionic molecule of m/z 659 Da was selected in the first 
quadrupole mass spectrometer, it was fragmented by 
collision-induced dissociation, and the fragment ions 
produced a mass spectrum in the second quadrupole 
mass spectrometer of the tandem. The pattern of frag- 
ments demonstrated that the flavin was covalently 
attached to Cysteine 342 of the protein.’ Such a system 
can also be used to identify the locations in the sequence 
of a protein at which it is phosphorylated.** 

If the posttranslational modification cannot be 
identified by its mass or its pattern of fragmentation, it is 
usually possible to hydrolyze the polypeptide and liber- 
ate the modified amino acid. Usually the hydrolysis is 
performed enzymatically to avoid destruction of the 
modified amino acid that might occur in strong acid or 
strong base. Enough of the peculiar amino acid is 
purified to perform a proof of its structure by chemical 
analysis. 

One way in which two or more of the amino acid 
side chains in a polypeptide can be modified coinciden- 
tally is during the formation of a covalent cross-link 
between them or among them. The cross-link can be 
intramolecular, connecting two or more amino acid side 
chains in the same polypeptide, or intermolecular, con- 
necting two or more amino acid side chains in different 
polypeptides. There is no formal distinction between 
these two outcomes because the linkage is invariably 
made after the polypeptides have folded into their native 
structure and, subsequently, formed specific intermole- 
cular complexes among themselves. This folding and 
intermolecular assembly is what brings the two or more 
amino acid side chains that will be cross-linked into 
atomic contact with each other. Therefore, it is irrelevant 
whether the amino acid side chains started out on the 
same polypeptide or different polypeptides or whether 
they are at positions within the amino acid sequence of 
the same polypeptide that are close to or distant from 
each other. The only deciding factor is that they are 
immediately adjacent to each other in the final structure 
of the mature protein. 

A simple example of a covalent cross-link is an 
amide between a lysine side chain and a glutamate side 
chain. Such a cross-link is formed from a glutamine 
side chain and a lysine side chain, both within a 
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Table 3-1: Posttranslational Modifications of the Side Chains of Amino Acids in Proteins?” 


type of modification 


derivative of side chain 


phosphorylation 


sulfation 
carboxylation 


aromatic substitution 


methylation 


alkylation 
acylation 
amidation 


monooxygenation 


oxidation 

free radical 
ADP-ribosylation 
nucleotidylation 
hydrolysis 


dehydration 
glycosylation 


cross-links between side 
chains of two amino acids 


covalently bound coenzymes 


O-phosphoserine,”*’*** O-phosphothreonine,”**”” O-phosphotyrosine, 2 N-phospholysine,?” 
N-phosphohistidine,“** N-phosphoarginine,””’ S-phosphocysteine,”'° O-phosphoaspartate, "20 
O-phosphoglutamate”” 


O-sulfotyrosine?" 


4-carboxyglutamate,””**? 3-carboxyaspartate”” 


3,5-diiodotyrosine, >’ 3-iodotyrosine, > > 3,5-dibromotyrosine, "79 3-bromotyrosine, >> 


3-bromo-5-chlorotyrosine,?°°?% 3-chlorotyrosine,?® 3,5-dichlorotyrosine,?® O-(3,5-diiodo- 
4-hydroxyphenyl)-3,5-diiodotyrosine (thyroxine, T,),”° ?® O-(3-iodo-4-hydroxyphenyl)-3,5-diiodotyrosine 
(triiodothyronine, T3) „64266 9 _13-carboxamido-3- (trimethylammonio)propylJhistidine 

(diphthamide) 2726 2-(1-mannosyl)tryptophan,””"”! o-bromophenylalanine?” 


N-methyllysine,”*?” N,N-dimethyliysine,?’*?”® N,N,N-trimethyliysine,?’*?7°?7’ N®-methylarginine,?’®?” 
N°,N®-dimethylarginine,?7%*®° N®,N°-dimethylarginine,?3'#? N®-methylarginine,® 
N’-methylhistidine,?®? N'-methylhistidine, II O-methyl-p-aspartate,!” O-methylglutamate, 
O-methylisoaspartate,”“° S-methylmethionine,”” S-methylcysteine,* N-methylasparagine, 
N-methylglutamine,” 2-(S)-methylglutamine,'**”*! 5-(S)-methylarginine!**"" 


285 
288,289 


N-(4-amino-2-hydroxybutyl)lysine (hypusine),”*””™ S-farnesylcysteine,?'”?® S-geranylgeranylcysteine?'”° 
N-acetyllysine,”?’”*8 O-palmitoylthreonine,”” S-palmitoylcysteine,’*% S-stearoylcysteine’”® 


304 305 


y-poly(a-glutamyl)glutamate,”” y-poly(glycyl) glutamate 


5-hydroxylysine, °° N,N,N-trimethyl-5-hydroxylysine,” N,N,N-trimethy -O-phospho-5-hydroxylysine, 
4-hydroxyproline, °" 3-hydroxyproline,”"! 2,5-dihydroxytyrosine,°'” 6-hydroxyaspartate, 
B-hydroxyasparagine,°'® B-hydroxytryptophan,’'’°'? m-hydroxyphenylalanine,”” 3-hydroxytyrosine 
(3,4-dihydroxyphenylalanine, DOPA), 3 a@-hydroxyglycine*™* 


6-deamino-6-oxolysine (allysine) °°” 6-deamino-5-hydroxy-6-oxolysine,°" cysteinesulfenic acid,’* 


cyteinesulfinic acid,”° B-dethio-B-oxocysteine,”’ methioninesulfone”® 


LI 30 


tyrosyl free radica 

N®-(ADP-ribosyl)arginine,**" “3 N-(ADP-ribosyl)asparagine,*** S-(ADP-ribosyl)cysteine,* 
poly(ADP-ribosyl) glutamate, 8227 1 -[N-(ADP-ribosyl)]-2-[3-carboxamido- 
3-(trimethylammonio) propyl|histidine**? 


glycyl free radica 


O-(5’-adenylyl)tyrosine,“’ O-(5’-uridylyl)tyrosine,*"' O-[5’-(5-mercapto)uridylyl]tyrosine*™” 


176,345,346 


citrulline from arginine,’" ornithine from arginine,*“ aspartate from asparagine, glutamate from 
glutamine*”® 


347-349 


dehydroalanine from serine, o, B-dehydrotyrosine****>! 


O-poly(mannosyl)serine,*** O-poly(mannosyl)threonine,®” O-oligo[(a1,2)galactosyl]serine,® 
O-[3-O-(B-glucosyl) -o-fucosyl]threonine,** O-[2-O-(a-glucosyl)-B- galactosyl] -5-hydroxylysine,°” 
O-(B-xylosyl)serine,****’ O-[4-O-(ß-galactosyl)-B-xylosyliserine, °”’ S-digalactosylcysteine,°®® 
S-triglucosylcysteine,**? O-(glucosylarabinosyl)hydroxyproline,?® O-(N-acetylglucosaminy])serine, 
O-(N-acetylglucosaminyl)threonine,*” O-poly(arabinofuranosyl)}hydroxyproline,’® 
O-{3-[p-xylosyl(«1,3)-D-xylosyl]-D-glucosyl}serine,°° O-poly(glucosyDtyrosine?‘° 


361,362 


lysine in amide linkage with aspartate,*® lysine in amide linkage with glutamate,**”*® cysteine in thioester 


with glutamate,” 2-(S-cysteinylhistidine,*”’*” 3-(S-cysteinyl)tyrosine,*”*°" 3-(1- 
histidyl)tyrosine,*”°°” 

3-(3-tyrosyl)tyrosine®’’*”* O-(3-tyrosyl)tyrosine,*’”*” 3,5-di(3-tyrosyl)tyrosine,?”® 
3,3’-methylenebis(tyrosine),*” 3-(3-tyrosyl)-O-(3-tyrosyl)tyrosine,*® 3-(O-tyrosyl)- 
5-(3-tyrosyl)tyrosine,*™ 6,7-dioxo-4-(2-tryptophany]) tryptophan (tryptophan tryptophylquinone), 
5-hydroxy-2-(N-Iysyl)tyrosine,°® cystine 


385,386 


heme in thioether linkages to two cysteines,” S-phycoerythrobilinylcysteine,””° 
bis(S-cysteinyl)phycoerythrobilin,”” 8a-(S-cysteinyl)-8a-hydroxyflavin adenine dinucleotide,** 
8a-(S-cysteinyl)flavin adenine dinucleotide,*” 8a-(N°-histidinyl)flavin adenine dinucleotide,” 
6-(S-cysteinyl) flavin mononucleotide," 8a-(N*-histidinyl)flavin mononucleotide,*”° 
8a-(O-tyrosyl)flavin adenine dinucleotide,” N-biotinyliysine,”” N-lipoyllysine,*” 
N-(phosphopyridoxyl)lysine,*°!“” N-retinyllysine,““*““ O-(4’-phosphopantetheinyl)serine*” heme in 
thioether linkage to two cysteines and ether linkage to tyrosine at a meso position, HP 
2’-[5”-(phosphoseryl)ribosyl]-3’-dephosphocoenzyme AT 
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Figure 3-18: Posttranslational modifications of amino acid side chains in the interior of a polypeptide. 
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folded polypeptide, by the enzyme protein-glutamine 
y-glutamyltransferase.‘” Thioethers between a cysteine 
side chain and the aromatic ring of either a tyrosine side 
chain (3-3)°” or a histidine side chain (3-4)? 


3-3 


are cross-links that have been identified in galactose oxi- 
dase,”* and in hemocyanin,”® and monophenol 
monooxygenase,’ "??? respectively. 

A large number of cross-links form in both collagen 
and elastin as the direct result of the posttranslational 
oxidative deamination of lysine side chains in these pro- 
teins to the corresponding aliphatic 6-deamino-6-oxoly- 
syl aldehydes in a reaction performed by the enzyme 
protein-lysine 6-oxidase.*” These aldehydes are formed 
in the vicinity of each other as well as in the vicinity of 
other lysines and of 5-hydroxylysines and 5-hydroxy- 
6-deamino-6-oxolysyl aldehydes, all also derived from 
lysine side chains. A dazzling array of aliphatic carbonyl 
chemistry is initiated by the formation of the reactive 
aldehydes in this environment, including aldol con- 
densations, imine formations, dehydrations of B-hydrox- 
yaldehydes, dehydrations of aliphatic alcohols, Michael 
additions, and oxidations (Figure 3-19). Only four out of 
the more than 25 cross-links that result from this uncon- 
trolled flurry of reactions? are displayed in the figure. 
The purpose of these cross-links is to strengthen fibers of 
collagen. Cross-links between tyrosine side chains, such 
as 3-(3-tyrosyl)tyrosine (Figure 3-18) and the other 
examples listed in Table 3-1, also serve to strengthen the 
biological fibers, films, and coatings in which they are 
found. 

One of the most common posttranslational cross- 
links in a protein is the disulfide that forms when two cys- 
teine side chains are oxidatively coupled to form one 
cystine side chain (Figure 3-20). While cysteine is unsta- 
ble under the conditions necessary to hydrolyze proteins 
in strong acid, cystine is stable, and its appearance on the 
standard ion-exchange chromatogram between alanine 
and valine (Figure 1-3) establishes the presence of cys- 
tine side chains in a protein. 

The interior of most cells has a high concentration 
of a small, free thiol, such as glutathione 


*H3N H Q 
N N 2er 
H O 


that reduces back to cysteine any cystine that forms in 
cytoplasmic proteins in a reaction known as disulfide 
interchange (Figure 3-20). The net effect of disulfide 
interchange is to set the cystine side chains in a protein 
in equilibrium with the disulfide of the small thiol such 
that 


RS-SR] [ prot (SH) 
gy 
[RSH]? [prot (S-S) ] 


where RSH is the small thiol, RS-SR its disulfide, 
[prot(SH),] is the molar concentration of a particular 
reduced pair of adjacent cysteine side chains in the 
folded polypeptide, and [prot(S-S)] is the concentration 
of cystine between these same two side chains, also in 
the folded polypeptide. The reaction as written is first- 
order in reduced protein because the oxidation of the 
reduced protein is intramolecular. Equation 3-14 can be 
rearranged: 


Keq [RSH]? 
[RS-SR] 


t (SH) 
A | (3-15) 
[prot (S-S)] 


The point made by this equation is that the greater the 
ratio [RSH]?/[RS-SR], the less cystine will be found in 
proteins. 

In the cytoplasm, [RS-SR] is kept at a low level 
enzymatically. For example, when RSH is glutathione, 
GSH, the enzyme glutathione reductase accomplishes 
this: 


glutathione 


reductase 
H* + GSSG + NADPH 2GSH + NADP+ 


(3-16) 


This reaction drives the equilibrium of Equation 3-14 in 
the direction of the reduced protein by coupling it to the 
level of reduction of NADPH. The result of all these facts 
is that proteins confined to the cytoplasm usually” do 
not contain cystine, while proteins removed from the 
cytoplasm often do contain cystine. 

A protein is usually prepared for sequencing by 
reducing any cystine side chains it may contain with a 
small thiol such as 2-mercaptoethanol (Figure 3-20, 
R = HOCH3CH;-) and then alkylating all of its cysteines 
with a reagent such as iodoacetic acid 


H NV 
® O2 
O NH3 protein- 

lysine 

6-oxidase 

HN aldol 
H condensation 

7 N 
HN OH NH 
O 
O 
0” H 
7 
HN 1) imine 
NH2 2) dehydration 
0 H,O 
/ N 
HN NH 
/ O CS 
H 
N H us H 
ei O 
1) Michael 
addition 
2) imine \ 
Hu 3) enamine N 


aromatization 


-H 


N 
H 


Posttranslational Modification 123 


onan 


) procollagen ) protein- 
lysine lysine 
5-dioxygenas 6-oxidase 
OH 
H 
O 
O 


aldols, imines 
dehydrations, etc. 


(0) 


a desmosine 


Figure 3-19: Examples of the formation of four of the more than 25 cross-links initiated by the formation of 6-deamino-6-oxolysine in col- 
lagen by protein-lysine 6-oxidase. Shown is an aldol condensation to produce the first cross-link. The ß-hydroxyaldehyde can then form an 
imine with a lysine that dehydrates to an &,ß-unsaturated imine, a product that cross-links three amino acids. The enol of another aldehyde 
can add to the a, B-unsaturated imine, and the initial enamine can condense to an imine with the carbonyl of the aldehyde. This forms a dihy- 
dropyridine that cross-links four amino acid residues. The pyridinium cation formed upon oxidation of the dihydropyridine is a desmosine 
linking the four amino acid residues. Upper right corner: 6-Deamino-5-hydroxy-6-oxolysine formed by the consecutive action of procolla- 
gen-lysine 5-dioxygenase and protein-lysine 6-oxidase produces an a@-hydroxyaldehyde, which is susceptible to an even more complicated 


set of modifications. 
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Figure 3-20: Reduction of a cystine side chain by disulfide interchange. The cystine connecting two segments of polypeptide is exposed to 
an external thiolate (RS) that displaces the cysteinyl anion by nucleophilic substitution and is in turn removed by another thiolate. 
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| S-carboxymethylcysteine 
(3-17) 


iodoacetamide, N-ethylmaleimide, 4-vinylpyridine, or 
2-vinylpyridine to prevent their reoxidation. This creates 
the stable amino acid side chains S-(carboxymethyl) 
cysteine (Equation 3-17), S-(carboxamidomethyl) 
cysteine, S-(N-ethyl-2-succinimidyl)cysteine, S-[2- 
(4-pyridyl)ethylicysteine, or S-[2-(2-pyridyl)ethyl]cysteine, 
respectively, in place of the unstable cysteine. S-[2- 
(2-Pyridyl)ethyl]cysteine absorbs at 254 nm, and its 
absorption can be used to identify peptides containing 
cysteine. 

In experimental situations where cystines in a pro- 
tein must be reduced, a problem arises. If the protein 
remains folded, its cystine side chains are generally more 
stable, relative to the adjacent cysteine side chains that 
would be produced upon its reduction, than is the disul- 
fide of a small reducing agent such as 2-mercap- 
toethanol, relative to its thiol form. This results from the 
fact that the reaction in the protein is intramolecular in 
the direction of oxidation, but the reaction of the 2-mer- 
captoethanol is intermolecular in the direction of oxida- 
tion (Equation 3-14). The problem is solved either by 
using an intramolecular dithiol such as 2,3-dihydroxy- 
1,4-dithiobutane (dithiothreitol, DTT)‘ 


HO HO 
OH OH 
RSSR + HLO S BEE ESCH + 2RS” 
© © S 


which produces a stable disulfide as a product, or by 
unfolding the protein with urea or guanidinium chloride, 
which causes the reaction of the cysteine side chains in 
the protein to be formally intermolecular in the direction 
of oxidation, or by simultaneously applying both of these 
strategies. 

There are a few examples in which a protein con- 
tains a cystine connecting two cysteines that are adjacent 
to each other in the amino acid sequence.” The 
eight-membered ring that results, however, is unstable 
for steric reasons.*****8° Cystines connecting two cys- 
teines from different folded polypeptides are also occa- 
sionally encountered,’ but in the majority of the 
extracytoplasmic proteins that contain cystine side 
chains as a posttranslational modification, the two 
cysteine side chains that are connected to each other are 
in the same folded polypeptide but far from each other in 
the amino acid sequence. For example, in ribonuclease, 
a protein formed from a polypeptide of 124 aa, the 
cystines are formed from Cysteine 96 and Cysteine 40, 
Cysteine 58 and Cysteine 110, Cysteine 26 and 
Cysteine 84, and Cysteine 65 and Cysteine 72. When the 
polypeptide folds to form the native structure of ribonu- 
clease, these pairs of cysteine side chains, which were 
distant from each other in the unfolded polypeptide, are 
juxtaposed. 

It is the juxtaposition that not only determines 
what cystines form but also brings the two sulfurs close 
enough to each other that they can react at an apprecia- 
ble rate.” They are then oxidized to cystine by molec- 
ular oxygen“? 


RıCH3SH + HSCH2R?2 + Let: 


RCH SSCH,R, + H,O 
(3-19) 


m 


or by disulfide interchange (Figure 3-20) with cystines in 
protein disulfide-isomerase.“’“* Protein disulfide- 
isomerase’ contains a cystine so unstable that the 


equilibrium constant“ for disulfide interchange 
between its disulfide (taking the place of RS-SR in 
Equation 3-14) and the disulfide in a normal extracyto- 
plasmic protein [prot (S-S) in Equation 3-14] is about 
10°. Unlike synthetic 2,3-dihydroxy-1,4-dithiobutane or 
the natural protein thioredoxin, both of which can cleave 
disulfides in native proteins, protein disulfide-isomerase 
forms them. 

The identification of the two cysteine side chains 
that are connected in a particular cystine in a native pro- 
tein requires that a peptide containing only those two 
cysteines still joined as the cystine be isolated from a 
digest of the protein.’ Before the protein is unfolded 
or digested, however, it must be treated with an alkylat- 
ing agent such as N-ethylmaleimide under conditions 
capable of capping off all the free sulfhydryls in the 
preparation, which if left unalkylated would catalyze 
disulfide interchange (Figure 3-20) and thereby scram- 
ble the disulfides.*”’ Ideally, the peptides with intact cys- 
tine side chains used to identify the cysteines involved 
should be two short peptides held together by the cystine 
itself. For example, one of the peptides from ribonucle- 
ase isolated from a digest of the protein performed with 
pepsin, trypsin, and chymotrypsin was composed of the 
two smaller peptides NGQTNCYH and NVACK, cova- 
lently coupled by a cystine between the two cysteine side 
chains.” From this result it could be concluded that, in 
native ribonuclease, Cysteine 65 is coupled as a cystine 
with Cysteine 72. 

The three digestions used in the experiments just 
described served the purpose of producing bispeptides 
containing cystine that were as small as possible. This 
precaution avoids the confusion of having several large 
peptides interlaced by multiple disulfides*! into one 
large, intractable peptide. This problem, however, is 
sometimes unavoidable, as in the case of thrombomod- 
ulin, in which three cystines occur within a short 
sequence of 14 amino acids and from which individual 
peptides containing each of them could not be obtained. 
In this case, the linkages were assigned*” by following 
the rates at which the individual cysteines appeared as 
the cystines were slowly cleaved with the nucleophile 
tris(2-carboxyethyl)phosphine.** 

Because oxidation states of cysteine (Figure 2-8) 
other than cystine as well as covalent modifications of 
cysteine“ revert to yield free cysteine upon addition of a 
thiol such as 2,3-dihydroxy-1,4-dithiobutane, the indi- 
rect assignment of a cystine based solely upon the 
appearance of free cysteine after the addition of a thiol 
cannot be trusted.'°”"”*° 

Procedures have been developed to assist in the 
analysis of peptides containing cystine. Sensitive 
methods have been described for continuously moni- 
toring chromatograms of digests performed with 
endopeptidases to detect peptides containing intact 
cystine side chains either electrochemically after they 
have been reductively cleaved on the surface of an elec- 
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trode”” or colorimetrically after they have been nucle- 


ophilically cleaved with tributylphosphine.“” The 
bis(phenylthiohydantoin) of cystine (Figure 3-1) dis- 
plays a unique relative mobility on the high-pressure 
liquid chromatograms used to identify the products 
from the steps of automated Edman degradation.****” 
Peptides containing cystine that have been purified 
from a digest of a protein can also be positively identi- 
fied by mass spectrometry.““*' The advantage in this 
instance is that the gaseous molecular ion of the bis- 
peptide, necessarily containing the intact cystine, is 
observed directly. The presence of a cystine within a 
peptide can be established by mass spectrometry 
because the mass of the peptide gradually increases by 
2 Da as a result of photoreduction during successive 
shots from the laser during its vaporization.“ 
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Problem 3-12: Assume that a protein containing an 
intein has a cysteine at the amino terminus of the intein 
and a serine at the amino terminus of the carboxy-termi- 
nal segment. Write the mechanism for intein splicing 
involving the initial N — S migration, an S — O migra- 
tion, cleavage of the peptide bond between the intein 
and the carboxy-terminal segment to produce the 
unsubstituted aspartyl imide, and the final O — N migra- 
tion to produce the new peptide bond. 


Problem 3-13: Draw the structure of a polypeptide with 
an amino-terminal cysteine residue the a-amine of 
which is acylated with palmitate and the sulfur of which 
forms a thioether with C3 of a 1,2-dipalmitoyl-3-deoxy- 
glycerol. 


Problem 3-14: A remarkable feature of the enzyme glu- 
tamate-ammonia ligase from E coli is that its catalytic 
properties depend on the conditions of growth under 
which the E. coli from which it is purified were grown. 
The enzyme purified from E coli grown on NH,Cl and 
glucose (Type I) is less sensitive to inhibition by AMP 
than is the enzyme purified from E. coli grown on gluta- 
mate and glycerol (Type II). The Type II enzyme can be 
converted into Type I enzyme if it is treated with snake 
venom phosphodiesterase.’® 

When Type II enzyme was digested with snake 
venom phosphodiesterase and subsequently precipi- 
tated out of solution with trichloroacetic acid, the super- 
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natant solution contained material having a maximum 
absorbance at 260 nm. 

Hydrolysis of the Type I and Type II enzymes was 
performed by a series of endopeptidases. Both enzymes 
were split into the same number of peptides, but one 
decapeptide from the Type II enzyme differed in 
chromatographic behavior from the similar decapeptide 
from the Type I enzyme. The single different peptide 
isolated from the Type II enzyme had the following com- 
position after acid hydrolysis: 


Hroarho 


2 
2 
3 
1 
1 
1 
adenine 1 
D-ribose 1 
phosphate 1 
From an acid-base titration, the following pKa 
values were measured for the decapeptide isolated from 
Type II enzyme before and after treatment with snake 
venom phophodiesterase. 


PKa 
number of groups before after 
1 3.4 3.4 
1 3.7 3.7 
4 4-4.5 4-4.5 
1 7.8 7.8 
1 6.0 
1 10.0 


How does the covalent structure of the Type II 
enzyme differ from that of the Type I enzyme? Explain 
each result described above on the basis of the proposed 
structure. 


Problem 3-15: Write mechanisms for the formation of 
the following posttranslational modification found in 
histidine ammonia-lyase“™ 


CH2 
N OU 
H Lu NS 
Ges Kee? 
Di H Hy O 
H3C 
and the following posttranslational modification pro- 


ducing the chromophore in red fluorescent protein from 
Discosoma:*® 


Write resonance structures for the chromophore. 


Problem 3-16: Write a step-by-step series of reactions 
that show how the cross-link dehydromerohistidine 
would form in collagen from three lysine residues and a 
histidine residue. 


Problem 3-17: The sequence of tick anticoagulant 
protein is YNRLCIKPRDWIDECDSNEGGERAYFRNGKG- 
GCDSFWICPEDHTGADYYSSYRDCFNACI. 

A peptide was purified from a tryptic digest of the 
protein that produced the following results on Edman 
degradation.“ 


cycle 1 2 3 4 5 6 
phenylthiohydantoins DG G,WC*,FI D ES F 
cycle 7 8 9 10 ll 


phenylthiohydantoins DAW ILS CN EP EG 


*Bis (phenylthiohydantoin) of cystine. 


How are the cysteines linked to form the two cystines in the 
peptide? What unexpected cleavage did trypsin produce? 


Oligosaccharides of Glycoproteins 


Living organisms are formed from three types ofcovalent 
polymers: proteins, nucleic acids, and polysaccharides. 
Polysaccharides used biologically for structural purposes 


or for the storage of carbohydrate occur as long, often 
branched, uniform polymers of monosaccharides 
(sugars). Examples of polysaccharides would be agarose 
(Figure 1-7), cellulose (Figure 1-2), starch,“®’ hyaluronic 
acid, chitin, and glycogen. Although the reducing ends of 
these polysaccharides are sometimes attached covalently 
to particular proteins,“ this fact is secondary to their 
biological roles. Oligosaccharides are shorter, more 
heterogeneous oligomers of monosaccharides. Oligo- 
saccharides are frequently attached to recently synthe- 
sized polypeptides as posttranslational modifications of 
serines, threonines, or asparagines. Such posttransla- 
tional modifications produce glycoproteins. 
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A glycoprotein is any protein to which one or more 
oligosaccharides are covalently attached. To define the 
complete covalent structure of a glycoprotein, not only 
the amino acid sequence of the protein but also the 
points of attachment and the sequences of the monosac- 
charide in the oligosaccharides must be established. 
Some of the rarely occurring oligosaccharides and their 
sites of attachment have been listed along with the other 
posttranslational modifications in Table 3-1. The more 
commonly encountered oligosaccharides in glycopro- 
teins from animals and plants, however, are branched 
oligomers attached through N-acetylglucosamine to 
asparagine side chains or through N-acetylgalac- 


Figure 3-21: Covalent structure of 
one of the oligosaccharides 
attached through asparagine to 
human immunoglobulin D.’® This 


OH OH is an example of a high-mannose 
oligosaccharide. 
OH 
Oo H 
O 
o OH 
OH 


OH Figure 3-22: Covalent structure of 


oligosaccharides 


L Hy O one of the 
H3C N HO attached through asparagine to 
Lon ZL OU O human immunoglobulin D.*® This 


OH is an example of a complex 
N-linked oligosaccharide. 
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tosamine to serine and threonine side chains (Figures 
3-21, 3-22, and 3-23). 

The monomers from which these oligosaccharides 
are constructed are monosaccharides. The eleven major 
monosaccharides that are found in the oligosaccharides 
of glycoproteins are D-mannose, D-galactose, D-glucose, 
N-acetyl-D-glucosamine, N-acetyl-D-galactosamine, the 
sialic acids, D-glucuronic acid, L-fucose, L-rhamnose, 
D-xylose, and L-arabinose (Figure 3-24). Several of the 
monosaccharides in glycoproteins can be O-sul- 
fated, "7°"? and mannoses and N-acetylglucosamines can 
be O-(2-aminoethyl) phosphonylated.*” 

A variety of different sialic acids are known (greater 
than 40) that are derivatives of either D-neuraminic acid 
(Figure 3-24) or the closely related D-5-deamino- 
5(S)-hydroxyneuraminic acid (2-keto-3-deoxy-D-glycero- 
D-galacto-noninic acid; 3-deoxy-D-glycero-D-galacto- 
nonulosonic acid).’’%”” These two anionic mono- 
saccharides are modified variously by N-acetylation, 
N-glycolylation, O-lactylation, O-sulfation, O-methyla- 
tion, O-phosphorylation, and O-acetylation.*” 

The covalent bonds that link the monosaccharides 
together are those of acetals and occasionally ketals. 
Glycosidic linkages are the bonds in these acetals and 
ketals formed between the only carbonyl carbon on each 
monosaccharide, enclosed within a pyranose ring or a 
furanose ring as a hemiacetal, and one of the hydroxyl 
groups of the preceding monosaccharide in the 
oligomer. A glycosidic linkage is formed between the 
oxocarbenium cation of the pyranose or furanose and a 
lone pair of electrons on a nitrogen or an oxygen. An 
example would be 


Heo 
+ - 
Kb HO Sr 
© O on glucose | H-0 TH 
HP OH ee 
ei HO OH 
0,7 e 
"Ho Mio CHOH 
HO (ox H „OH 
O Ho- O 
HO o 
HO OH NG 
Ho H ox OH 
ge SO NORD 
O NH OH 
HO Grin O H 
H 
CH 
HO OH 3 
` OHO 1 0H 
O H 


© 
HO. 
O 
HO on 
HO 
HO 
O OH 
H ER OH 
+H* H20 
r ‘ou 
O 
EH 
HO 


(3-20) 


Branching of the oligosaccharide (Figures 3-21 to 3-23) 
occurs whenever two or more of the hydroxyl groups on 
one of the monosaccharides participate in glycosidic 
linkages. 

Each of the oligosaccharides on a glycoprotein can 
be thought of as beginning at the monosaccharide that is 
attached to the polypeptide (the reducing end). The 
point of attachment is either an O-glycosidic linkage, 
formed between the carbonyl carbon of this initial 
monosaccharide and the hydroxyl group of a serine or 
threonine side chain, or an N-glycosidic linkage, formed 
between the carbonyl carbon of this initial monosaccha- 
ride and the amide nitrogen of an asparagine side chain. 
The first monosaccharide in an oligosaccharide attached 
to a serine or a threonine is usually N-acetylgalac- 
tosamine; the first sugar in an oligosaccharide attached 
to asparagine is almost always N-acetylglucosamine. 
Peripheral to this initial monomer, the oligomer will be 
found to branch at several points and end at each of 
several unsubstituted monosaccharides that occupy the 
last positions on the branches (the nonreducing ends). 
There are usually 2-8 monosaccharides counting from 
the initial monosaccharide to the end of a branch and 


Figure 3-23: Covalent structure of one of the oligosaccharides attached 
through serine and threonine to human colonic mucin.*” This is an example of 
an O-linked oligosaccharide. 
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Figure 3-24: Eleven most common monosaccharides composing 
the oligosaccharides of glycoproteins. 


1-4 branches in a typical oligosaccharide on a glycopro- 
tein. Often a branch has only one monosaccharide. 
Writing the sequences of the oligosaccharides on 
glycoproteins is complicated by the requirements that 
each monosaccharide must be noted, the particular 
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hydroxyl group in the preceding monosaccharide to 
which it is attached through its carbonyl carbon must be 
noted, and the anomeric state of its carbonyl carbon 
must be noted. It is usually assumed that unless other- 
wise stated the monosaccharides in the oligosaccharide 
are of the D stereochemistry, where, for example, 
L-fucose, L-rhamnose, and L-arabinose would be excep- 
tions to be noted, and in the pyranose form, where, for 
example, galactofuranose and arabinofuranose would be 
exceptions to be noted. To confuse matters further, the 
sequences of oligosaccharides are usually written from 
right to left beginning at the right with the monosaccha- 
ride attached directly to the protein. The anomeric state 
of the carbonyl carbon and the hydroxyl group to which 
it is attached are noted to the right of the name of each 
monosaccharide. For example, GlcNAc(1,4) states that 
an N-acetylglucosamine is attached through its carbonyl 
carbon, carbon 1, to the 4-hydroxyl group of its immedi- 
ate predecessor to the right in the written sequence by a 
B-anomeric acetal. In addition, any modifications of the 
monosaccharides, such as O-acetylation, O-sulfation, 
O-phosphorylation, de-N-acetylation, or O-methylation, 
must be noted. The actual structure of the oligosaccha- 
ride in Figure 3-23 can be compared to its written 
sequence (Table 3-2, entry 7). 

There is yet a further peculiarity of oligosaccha- 
rides, that of microheterogeneity. Because serious bio- 
logical problems do not seem to arise when unfinished 
oligosaccharides are produced, in contrast to the devas- 
tation that would occur if unfinished proteins and 
nucleic acids were produced, natural selection has not 
enforced uniformity on the synthesis of oligosaccha- 
rides. It may even be the case that the existing lack of uni- 
formity has advantages for which natural selection has 
selected. Although a few finished glycoproteins are 
homogeneous, in most instances the synthesis of the 
oligosaccharides is an apparently haphazard stochastic 
process, and each oligosaccharide that ends up attached 
at a particular site in a glycoprotein is usually unfinished. 
Each, however, is unfinished in a different way. Each is 
missing a different set of monosaccharides. As a result, 10 
or 15 different oligosaccharides may be found on the 
amino acid side chain at the same position in the amino 
acid sequence of different molecules of the same protein. 

Once they have been separately identified and indi- 
vidually sequenced, each of these oligosaccharides can 
be recognized as a different, incomplete realization of 
only one complete sequence. This prototypical sequence 
is often longer than any one of the sequences of the 
actual oligosaccharides, but each of the sequences of the 
actual oligosaccharides is a piece of the prototype and 
every sugar in the prototype is represented in one of the 
actual oligosaccharides. Whether the prototype includ- 
ing all of the actual sequences is the most complete 
sequence that could have been produced or is itself only 
an incomplete realization of a longer sequence can never 
be decided unequivocally. As an example, 13 different 
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Table 3-2: Oligosaccharides Isolated from Human Colonic Mucin 


470,a 


(1 ` Sia(@2,6)GalNAc? 
2) GlcNAc($1,3)GalNAc 
(3) GlcNAc($1,3)GalNAc 


Sia(a2,6) 


(4)  Gal(B1,4)GlcNAc(61,3)GalNAc 
(5)  Gal(ß1,4)GleNAc(ß1,3) GalNAc 


Sia(a2,6) 


(6) GlIcNAcf(1,3)Gal(B1,4) GIcNAc($1,3)GalNAc 
(7) Sia(02,6)Gal(ß1,3)GlcNAc(ß1,3) 


Sia(02,6)Gal(ß1,3)GlcNAc(ß1,6) Gal(ß1,4)GleNAc(ß1,3)GalNAc 


Sia(@2,6) 


“Only 7 of the 13 oligosaccharides isolated are tabulated. "For abbreviations see Figure 3-27. 


oligosaccharides were isolated from human colonic 
mucin.*” All of the other 12, of which six are presented in 
Table 3-2, were incomplete realizations of the largest 
(Table 3-2, entry 7). It is possible, however, that this 
largest one may be an incomplete realization of an even 
larger oligosaccharide that escaped detection. 

Another view of microheterogeneity, in opposition 
to the view that it is haphazard and purposeless, is that it 
has a role in producing many different glycoforms of the 
same protein. This would increase the functional range 
of these proteins and would be advantageous in particu- 
lar situations. For example, the microheterogeneity 
observed in the set of oligosaccharides isolated from 
human colonic mucin (Table 3-2) may permit the 
oligosaccharides on this glycoprotein to ensnare many 
different species of bacteria, each of which binds specif- 
ically to only one or a few oligosaccharide sequences. It 
is probably the case that microheterogeneity is relevant 
in some instances and irrelevant in others. For example, 
the length and amount of branching in the oligosaccha- 
rides on erythropoetin determines its biological activ- 
ity,” but the presence or absence of oligosaccharide on 
channel-forming intrinsic protein has no effect on its 
function.“ Unlike most proteins, most oligosaccharides 
do not assume a fixed conformation so the involvement 
of microheterogeneity in their biological specificity 
would be based mainly on differences in sequence. As 
the oligosaccharides, however, become more crowded®" 
or more branched,”® local steric effects become more 
numerous, and their confinement ofthe conformation of 
the oligosaccharide may contribute to differences in bio- 
logical function. 

From an examination of the sequences of the 
oligosaccharides attached to glycoproteins from animals 
and plants, several generalizations can be drawn. The 


most common of these oligosaccharides can be divided 
into three classes (Table 3-3). The high-mannose 
oligosaccharides begin with two N-acetylglucosamines 
linked (81,4) to each other, the first attached to an 
asparagine in the protein. These oligosaccharides con- 
tain 5-9 mannoses (Figure 3-21). The complex N-linked 
oligosaccharides, because they are biosynthetically 
derived from the high-mannose oligosaccharides, also 
begin with two N-acetylglucosamines linked (ß1,4) to 
each other, followed by three branched mannose 
residues. Beyond this structural core, variable amounts 
of N-acetylglucosamine, galactose, fucose, various sialic 
acids, and occasionally N-acetylgalactosamine”®® are 
attached (Figure 3-22). The O-linked oligosaccharides 
begin with an N-acetylgalactosamine linked to a serine 
or threonine on the protein and contain variable 
amounts of N-acetylglucosamine, galactose, N-acetyl- 
galactosamine, fucose, and various sialic acids (Figure 
3-23). 

High-mannose oligosaccharides occur in all 
eukaryotes, but those in fungi have differences in linkage 
and branching patterns®” from those in plants and ani- 
mals and often contain significantly more mannose.‘ 
Most, if not all? of the high-mannose oligosaccharides 
from the proteins of plants and animals are incomplete 
realizations of one complete, basic structure (Table 3-3, 
entry 1).’” This uniformity results from the fact that this 
unit is transferred in its entirety to the targeted 
asparagine side chain on the glycoprotein.” Then, in a 
specific sequence of steps, catalyzed by three exoman- 
nosidases, it is shortened until all of the mannoses in 
(01,2) linkage have been removed. When only five man- 
noses remain, the oligosaccharide is then elongated in a 
highly specific sequence of steps by specific glycosyl- 
transferases to produce complex N-linked oligosaccha- 


Table 3-3: Representatives “® 


Glycoproteins 
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of the Three Main Classes of Oligosaccharides on Animal 


(1) High-Mannose Oligosaccharide 
Man(a1,2)Man(qa1,6) 


Man(01,2)Man(«1l,3)Man(«1,6)Man(ß1,4)GlcNAc(ß1,4)GlcNAcßAsn“ 


Glu(a@1,2)Glu(@1,3)Glu(a@1,3)Man(a1,2)Man(al1,3) 


(2) Complex N-Linked Oligosaccharide 


Sia(02,6)Gal(ß1,4)GlcNAc(P1,6) 


Sia(02,6)Gal(ß1,4)GlcNAc(ß1,2)Man (01,6) 


GIcNAc(ß1,4)Man(ß1,4)GleNAc(ß1,4)GleNAcßAsn’ 


Sia(02,6)Gal(ß1,4)GlcNAc(ß1,2)Man(c1,3) 


Sia(02,6)Gal(ß1,4)GlcNAc(Pl,4) 


(3) O-Linked Oligosaccharide 


Gal(ß1,4)GlceNAc(ß1,6) 


Fuc(a1l,4) 


Gal(ß1,3)GleNAc(ß1,3)Gal(ß1,3)GalNAcSer/Thr“ 


Gal(ß1,4)GlcNAc(ß1,6) 


“From Chinese hamster ovary cells. ’From human plasma q-acid glycoprotein. ‘From blood group A active glycoprotein in human 


ovarian cyst fluid. 


rides. After the last step of this elongation, other man- 
nosidases remove two more mannoses to leave the three 
found in the mature complex N-linked oligosaccharide. 

At the end of this process, most of these complex 
N-linked oligosaccharides are also incomplete realiza- 
tions of one basic structure (Table 3-3, entry gy Snag 
but minor differences in the positions on the peripheral 
N-acetylglucosamines and galactoses at which the 
linkages are made have been noted” as well as 
substitution of the peripheral galactoses with N-acetyl- 
galactosamines.™ Short, repeating units of N-acetylglu- 
cosaminyl($1,3)galactose have also been observed 
inserted between the peripheral galactoses and N-acetyl- 
glucosamines of some complex N-linked oligosaccha- 
rides.“ Fucoses are found attached to many of the 
complex N-linked oligosaccharides from animals in 
(01,6) or (a@1,3) linkage to one or the other of the 
N-acetylglucosamines in their cores®”® or in (@1,3) or 
Lol, A) linkage to N-acetylglucosamines in their periph- 
eries. Xyloses are found attached to a few ofthe complex 
N-linked oligosaccharides from animals!" but many of 
the complex N-linked oligosaccharides from plants”! in 
(61,2) linkage to the central mannose in the core. 

It seems that, within the same protein, the oligosac- 


charides on certain asparagine side chains will remain as 
high-mannose oligosaccharides exclusively, while the 
oligosaccharides on other asparagine side chains are 
completely converted to complex N-linked oligosaccha- 
rides.“ Occasionally, however, a hybrid N-glycan is 
encountered, in which one of the branches in one of 
these oligosaccharides is of the complex structure while 
the other remains of the high-mannose structure H) pre- 
sumably because the processing on the latter branch was 
specifically blocked. 

The O-linked oligosaccharides display less uni- 
formity than the N-linked. This may result from the fact 
that they are built up one sugar at a time rather than as 
intact units.“ The O-linked oligosaccharides drawn in 
Figure 3-23 and presented in Table 3-3 include some of 
the common structural features of this class. By far the 
most common monosaccharide forming the linkage to 
the serine or threonine is N-acetylgalactosamine, but 
oligasaccharides O-linked through other monosaccha- 
rides have been reported.” The branches are con- 
structed from the basic repeating unit, Gal(ßl,3 or 
B1,4)GlcNAc(ßl,3 or ß1,4 or ß1,6). Branching usually 
occurs at a galactose or at the initial N-acetylgalac- 
tosamine, rarely if ever at an N-acetylglucosamine. The 
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basic repeating unit of each branch can begin with either 
an N-acetylglucosamine (Figure 3-23) or a galactose 
(Table 3-3). Fucose is found in (@1,4) and (a1,3) linkages 
to penultimate N-acetylglucosamines in addition to 
(a1,2) linkage to peripheral galactoses. The branches 
either end with a galactose of the repeating unit or are 
capped by an N-acetylgalactosamine in (1,3) or (01,4) 
linkage. Sialic acids are found in (02,6) or (02,3) linkage 
to galactoses or the initial N-acetylgalactosamine. Many 
variations on these patterns are observed,*7%+89501-508 
Often O-linked oligosaccharides are quite short. An 
example would be NeuNAc(02,3)Gal(ß1,3)[NeuNAc 
(a@2,6)|GalNAc.°°!° All these regularities seem to result 
from the fact that the sugars are added one at a time from 
the initial N-acetylgalactosamine outward by a limited 
set of glycosyltransferases. These enzymes are specific 
for particular sugars and attach them only to particular 
hydroxyl groups on particular sugars within the growing 
oligosaccharide. 

Two of the most heavily glycosylated glycoproteins 
found in animals are the mucins and the proteoglycans. 
These two types of glycoproteins can contain up to 80% 
or 90% carbohydrate by mass, respectively. 

The mucins are the glycoproteins that constitute 
mucus and also coat the surfaces of many types of cells. 


Table 3-4: Polysaccharides of Proteoglycans“ 


The human intestinal mucin MUC2 is a polypeptide 5159 
aa long.” Between Cysteine 1375 and Cysteine 1762 and 
between Cysteine 1858 and Isoleucine 4299, there are 
two regions of amino acid sequence that are rich in thre- 
onine (58% of the amino acids) and proline (24%) and are 
thought to contain the majority if not all of the sites for 
the O-linked glycosylation (Table 3-2), which occurs 
mainly on threonine.*” The larger of these two regions is 
made up almost exclusively of 101 consecutive repeats of 
the sequence -ITTTTTVIPTPTPTGTQTPTTTPI- with 
only a few substitutions over the entire length of 2323 aa. 
There are about 1100 N-acetylgalactosylthreonyl link- 
ages in the entire protein,” and if all of these are con- 
fined to the two regions rich in threonine, about 85% of 
the threonines in these regions carry oligosaccharides. 
From an examination of the repeating sequence and the 
fact that each oligosaccharide contains an average of 
four monosaccharides,’ one can gain an appreciation 
of how closely packed these oligosaccharides must be. 
Other mucins also have similar regions rich in threonine 
and serine, usually found in repeating sequences’! 
that are also heavily glycosylated. 

Proteoglycans are proteins to which particular 
types of regular polysaccharides are attached. 
Proteoglycans are secreted as extracellular matrix and 


type original repeating unit’ 


postsynthetic modification 


chondroitin sulfate 


dermatan sulfate 


heparin 


heparan sulfate! (a@1,4)GlcA(@1,4)GlcNAc(a@1,4) 


keratan sulfate 


(B1,4)GlcA(B1,3)GalNAc(B1,4) 


Së (B1,4)GlcA(o,3)GalNAc(B1,4) 
(B1,4)GlcA(B1,3)GalNAc(1,4) 


(a@1,4)GlcA(@1,4)GlcNAc(a@1,4) 


(B1,3)Gal(B1,3) GlcNAc(B1,3) 


GalNAc-6-SO3" 
GalNAc-4-SO3" 


GalNAc-4-SO3” 
GlcA — Idoa? 
IdoA-2-SO;- 


deacetylation’ 
N-sulfation® 

GlchN SC, -6-SO3_ 
GlcNAc-6-S0;7 
glucosamine-6-S037 
GlcA-2-SO;" 
GlcNAc-3-SO3” 


deacetylation’ 
N-sulfation® 

GlcA — IdoA? 
IdoA-2-SO3" 
GleNSO; -6-SO3_ 
GlcNAc-6-S0;7 
glucosamine-6-S037 


GlcNAc-6-S0;7 
Gal-6-S0;° 


“This is the disaccharide monomer constituting the newly synthesized polymer before postsynthetic modification. 

Epimerization of glucuronic acid at carbon 5 to form iduronic acid, the presence of which distinguishes dermatan sulfate 
from chondroitin sulfate. ‘Deacetylation and N-sulfation (-NSO;) are both incomplete so that N-acetylglucosamine, glu- 
cosamine, and N-sulfoglucosamine coexist in the same proteoglycan. “Epimerization of glucuronic acid at carbon 5 to form 
iduronic acid, the presence of which distinguishes heparan sulfate from heparin. 


are the main constituents in such structures as cartilage, 
vascular wall, and tendon. Although they often carry the 
usual N-linked and O-linked oligosaccharides, by defini- 
tion, they also carry at least one of a class of long poly- 
saccharides formed from repeating disaccharides (Table 
3-4). Each proteoglycan is defined by the repeating dis- 
accharide that forms the polysaccharide that is attached 
to it. These polysaccharides of repeating units are 
heterogeneous because of a collection of postsynthetic 
modifications (Table 3-4) that are only partially accom- 
plished, often concentrated within randomly spaced 
blocks of consecutive monosaccharides along the length 
of the polymer. One constant feature of the covalent 
structure of a proteoglycan is that each of its defining 
polysaccharides, except for keratan sulfate,°"° is O-linked 
to the protein through the oligosaccharide 
-GlcA(ß1,3)Gal(ß1,3)Gal(ß1,4)xylosylserine.”'® 

As with those of the mucins, the polypeptides of 
proteoglycans can be quite long. The polypeptide of one 
of the human chondroitin sulfate proteoglycans is 2293 
aa in length?" and has on average, 12,000 monosaccha- 
rides in its covalently attached oligosaccharides and 
polysaccharides.”'® Unlike the mucins, which contain 
short oligosaccharides densely packed together because 
they are on long strings of adjacent threonines, the pro- 
teoglycans contain long polysaccharides that can be 
attached to serines at isolated -Gly-Ser- or -Ser-Gly- 
sites scattered randomly over the sequence at intervals of 
about 50 aa.” In at least one of the proteoglycans, 
however, there is the sequence -YS(GS)>,L-, to the ser- 
ines of which heparin and chondroitin sulfate are 
attached.” 

The sequence of the monosaccharides in an 
oligosaccharide or polysaccharide on a glycoprotein is 
established by chemical analysis. The starting material in 
this analysis is a purified preparation of the glycoprotein 
itself. Often, to facilitate the analysis, the oligosaccha- 
rides on the glycoprotein have been made radioactive by 
growing cells producing it in the presence of one or two 
radioactive monosaccharides, for example, PH]mannose 
and (“C]glucosamine.” Oligosaccharides attached to a 
glycoprotein are isolated by digesting the protein with 
endopeptidases, purifying the resulting glycopeptides, 
and releasing the oligosaccharides from these glycopep- 
tides by chemical or enzymatic cleavage. The glycopep- 
tides produced by the digestion are usually separated on 
a chromatographic system, such as chromatography by 
reverse-phase adsorption, to separate them on the basis 
of only their amino acid sequence. In this way all of the 
oligosaccharides attached to a particular amino acid side 
chain in the sequence of the glycoprotein are isolated 
together.” From an examination of the amino acid 
sequences of a large number of such glycopeptides, it has 
been concluded that N-linked oligosaccharides from 
plants and animals are always attached to asparagines 
that have either a serine or a threonine two amino acids 
further on in the amino acid sequence (Asn-X- 
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Ser/Thr). The serines and threonines to which 
O-linked oligosaccharides are attached, however, are evi- 
dently not designated by any pattern in the surrounding 
sequence of amino acids”' but tend to be clustered in 
regions of the polypeptide rich in serines, threonines, 
and prolines. The ultimate example of this would be the 
mucin MUC2. 

The chemical or enzymatic cleavage used to release 
the several microscopically heterogeneous oligosaccha- 
rides from a particular glycopeptide depends upon the 
glycosidic linkage. For N-linked oligosaccharides, endo- 
glycosidases specific for cleavage within the common 
segment GleNAc(ß1,4)GleNAcAsn are usually used. For 
example, mannosyl-glycoprotein endo-ß-N-acetylglu- 
cosaminidase (endoglycosidase H) catalyzes hydrolysis 
of the glycosidic linkage between two N-acetylglu- 
cosamines and releases the oligosaccharide missing its 
initial monosaccharide, while peptide-N*-(N-acetyl- 
B-glucosaminyl)asparagine amidase (peptide:N-glycosi- 
dase F) cleaves the N-glycosidic linkage between an 
N-linked oligosaccharide and the asparagine on a glyco- 
protein or glycopeptide.’ Oligosaccharides in N-glyco- 
sidic linkage to asparagine can also be released from the 
glycopeptide by hydrazinolysis” and reacetylated with 
acetic anhydride. Regardless of the method by which it is 
released, the aldehyde at C1 of the initial N-acetylglu- 
cosamine in the oligosaccharide is usually reduced to the 
primary alcohol with Na[’H]BH, (Figure 3-25).2 This 
reduction eliminates the aldehyde, simplifies the subse- 
quent chemistry, and makes the oligosaccharide 
radioactive, if it is not so already. Oligosaccharides in 
O-glycosidic linkage to serine and threonine are usually 
released from the glycopeptides by treatment with base, 
which promotes ß-elimination (Figure 3-25). The treat- 
ment with base is performed in the presence of 
Na(PH]BH, to prevent, by reduction of the aldehyde at 
C1, the destruction of the oligosaccharide from its reduc- 
ing end and to make the released oligosaccharide 
radioactive. 

It is at this point that the technical consequences of 
microheterogeneity are experienced. Instead of one pure 
oligosaccharide released in a quantity equimolar to the 
amount of glycopeptide, a mixture of many oligosaccha- 
rides is produced, each present in a correspondingly 
small quantity. This mixture is first separated into 
neutral and anionic oligosaccharides chromatographi- 
cally.’ Chromatographic systems that separate the 
oligosaccharides by molecular exclusion or by anion 
exchange” are then used to perform further separa- 
tions. Chromatography by molecular exclusion provides 
an indication of the size of each oligosaccharide in the 
set. After the sialic acids have been removed from the 
anionic oligosaccharides by hydrolysis in mild acid and 
separately analyzed, the composition of each oligosac- 
charide is determined. This can be done by methanolysis 
under acidic conditions to cleave the acetals and coinci- 
dentally form methyl glycosides (Figure 3-26) that are 
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identified by their mobilities on gas chromatography. It 
is also possible to analyze the composition of the 
oligosaccharide directly by submitting it to hydrolysis in 
acid and separating the resulting monosaccharides and 
deacetylated amino sugars on chromatography by anion 
exchange (Figure 3-27) 18?’ the effluent from which is 
monitored electrochemically.”* 

The sequence of each of the purified oligosaccha- 
rides is determined by indirection. A series of chemical 
and enzymatic reactions is performed on the oligosac- 
charide and the outcome of each of these reactions is 
assessed either directly or by determining the change in 
composition of the oligosaccharide that occurs. The 
results of these various reactions are gathered until in 
their entirety they are consistent with only one of the 
many possible structures for the oligosaccharide. This 
one structure is then considered to be the actual struc- 
ture. The reactions used in this process are periodate oxi- 
dation, Smith degradation, treatment with glycosidases, 
and methylation. The results from these chemical analy- 
ses are often supplemented with nuclear magnetic reso- 
nance spectroscopy and mass spectrometry. 

When sodium metaperiodate (NalO,) is dissolved 
in water at acidic pH (pH 3-6) it forms a mixture of acidic 
hydrates referred to as periodic acid (HIO,). Periodic 
acid cleaves polyalcohols such as monosaccharides at 
the carbon-carbon bonds between vicinal diols and pro- 
duces two carbonyls from the two hydroxyl groups 
(Figure 3-28). Both of the hydroxyl groups in the vicinal 
diol must be free for periodic acid to cleave the 
carbon-carbon bond between them. The disappearance 
of a monosaccharide during treatment with periodic acid 
demonstrates that, in the intact oligosaccharide, the 
sugar that disappeared had at least two adjacent 
hydroxyl groups unbonded in glycosidic linkages. It is 
also possible to identify the actual products of the perio- 


date cleavage by mass spectrometry.” Oxidation by 
periodic acid can be performed sequentially by the Smith 
degradation (Figure 3-28). This series of reactions takes 
advantage of the lability to acid of a glycosidic linkage at 
carbon 1 ofa sugar that has been cleaved by periodic acid 
and the resulting aldehydes of which have been reduced 
with sodium borohydride. In theory this reaction should 
be able to cleave sugars sequentially from the ends of the 
branches inward, but in practice only one cycle is usually 
successful because the selectivity for acyclic acetals is not 
great. 

Amore informative sequence of cleavages can often 
be performed with exoglycosidases. These are enzymes 
that remove particular sugars from the ends of branches. 
They are highly specific for the sugar removed, the 
anomeric state of the glycosidic linkage, and sometimes 
the location of the hydroxyl group from which the bond 
has been formed. Examples of such exoglycosidases 
would be f-galactosidase, &-L-fucosidase, B-N-acetylglu- 
cosamidase, and exo-a-sialidase. An example of the 
specificity for the hydroxyl group would be the exo- 
a-2,3-sialidase from Newcastle disease virus. The release 
of a monosaccharide after exposure of the oligosaccha- 
ride to an exoglycosidase is evidence that that monosac- 
charide was at the end of a branch and attached to it by 
a glycosidic linkage of the designated anomeric stereo- 
chemistry. The digestions are usually performed sequen- 
tially. After each of the monosaccharides at the ends of 
the branches has been catalogued, each of the shortened 
products of the first round of digestions is then submit- 
ted to a round of digestion to identify the penultimate 
monosaccharides on each branch and so on until the last 
sugar is released. The products of each round of digestion 
are monitored either chromatographically or by mass 
spectrometry.” Several specific endoglycosidases, 
which cleave an oligosaccharide internally at particular 
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Figure 3-26: Acidic methanolysis of oligosaccharides. Protonation 
of the exocyclic acetal oxygen produces a leaving group, the depar- 
ture of which gives the planar oxacarbenium cation. Addition of 
methanol to either face of the oxacarbenium cation produces a 
mixture of the a- and ß-anomers of the methyl glycoside. 


bonds with high specificity, are also available. Examples 
of these enzymes are endo-1,4-f-galactosidase and 
endo-a-sialidase. Many of the complementary DNAs iso- 
lated from the original sources of these glycosidases have 
been transferred to expression vectors, and the proteins 
are expressed in high yield by transfected bacteria. One 
advantage to these expression systems is that these 
enzymes purified from recombinant bacteria are uncon- 
taminated by other glycosidases. 

An oligosaccharide can be chemically methylated 
on all of its free hydroxyl groups. This is done by forming 
the sodium alkoxides of the hydroxyl groups in a solution 
of dimethyl sulfoxide by using the sodium salt of the 
dimethyl sulfoxidate anion as the base. The alkoxides are 
then methylated with methyl iodide.°°' The methylated 
oligosaccharide is then hydrolyzed in acid, and the 
resulting monosaccharides are reduced with NaBH,. The 
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Figure 3-27: Chromatographic analysis of the hydrolysate of a 
glycopeptide to quantify its composition of monosaccharides.*”’ A 
sample (300 pmol) of a purified, homogeneous glycopeptide (with 
the composition GalzGleNAc;Man;) was hydrolyzed for Ah at 
100°C in 4 M trifluoroacetic acid. The acid was removed by evapo- 
ration, and the hydrolysate was submitted to chromatography 
(panel A) on a column (0.46 cm x 25 cm) of a medium for anion 
exchange equilibrated and eluted with 22 mM NaOH. The strong 
base makes the sugars sufficiently anionic to be separated by the 
chromatographic medium. The concentration of monosaccharide 
was monitored continuously with a pulsed amperometric detector 
(PAD). The detector responds to the current resulting from the 
uptake of electrons at a gold electrode (PAD response). The elec- 
trode is poised at +50 mV, which is a sufficiently positive potential 
to oxidize the polyols of the monosaccharides. It is this oxidation at 
the surface of the electrode that produces the current. Standards 
(25 pmol) were run (panel B) under the same conditions and sepa- 
rately identified as the following monosaccharides: 1, fucose; 
2, galactosamine; 3, glucosamine; 4, galactose; 5, glucose; 6, man- 
nose. From the areas of the peaks of the standards it could be cal- 
culated that the original hydrolysate contained 1.1 nmol of 
glucosamine, 0.79 nmol of galactose, and 0.75 nmol of mannose. 
The glucose observed was a contaminant. Reprinted with permis- 
sion from ref 527. Copyright 1988 Academic Press, Inc. 


resulting alditols are acetylated both at the hydroxyl 
groups produced by the hydrolysis of the glycosidic link- 
ages and at the hydroxyl groups produced by the reduc- 
tion of the aldehydes. The various methylated alditol 
acetates that result from this treatment are identified 
chromatographically. In this way, the hydroxyl groups at 
which the various monosaccharides were bonded in the 
glycosidic linkages of the original oligosaccharide can be 
identified because they are acetylated rather than methy- 
lated in the products. For example, upon methylation, 
the oligosaccharide drawn in Figure 3-21 yielded 
1,5-diacetyl-2,3,4,6-tetramethylmannitol, 1,2,5-triacetyl- 
3,4,6-trimethylmannitol, 2,4-dimethyl-1,3,5,6-tetraacetyl- 
mannitol, and smaller amounts of 1,3,5-triacetyl- 
2,4,6-trimethylmannitol and 3,6-dimethyl-1,4,5-triacetyl- 
N-acetylglucosaminitol.’® The appearance of each of 
these products is consistent with the structure ultimately 
proposed for the intact oligosaccharide. 
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Figure 3-28: Periodate oxidation and Smith degradation. 
Periodate oxidation (HIO,) cleaves the carbon-carbon bond 
between any two unbonded vicinal hydroxyl groups. The products 
are most readily understood by first putting a hydroxyl group at 
each position on a carbon that was involved in the carbon-carbon 
bond and then turning hydrates to aldehydes and orthoacids to 
acids. Following periodate oxidation, the aldehydes are reduced 
with borohydride anion, and the acetals containing the degraded 
monosaccharides are cleaved selectively in weak acid. This frees 
hydroxyl groups that were previously in glycosidic linkages and 
makes certain monosaccharides, which were resistant before, now 
susceptible to periodate oxidation. 


Although it is limited by the amounts (0.5 umol) of 
oligosaccharide needed, nuclear magnetic resonance 
spectroscopy has been applied to solving the sequence of 
oligosaccharides. The analysis has been most successful 
in cases where the oligosaccharide is a member of a class, 
such as high-mannose or complex oligosaccharides, the 


structures of which are predictable and for which many 
well-characterized standards are available to assist in the 
assignments of the various resonances.12'489491907:50832,938 
In at least one instance, however, the structures ofa nested 
set of oligosaccharides of increasing length, from a less 
well-characterized class of oligosaccharides, were deter- 
mined entirely by nuclear magnetic resonance spec- 
troscopy.” If a standard oligosaccharide of exactly the 
same structure as the oligosaccharide isolated from the 
glycoprotein is available, the coincidence of the nuclear 
magnetic resonance spectrum of the standard and that of 
the unknown is proof of the structure of the unknown "7 
In the nuclear magnetic resonance spectrum of an 
oligosaccharide, the chemical shift for the resonance of 
each of the various hydrogens attached to the carbons of 
the monosaccharides is characteristic of the monosac- 
charide itself, the carbon it is attached to, and whether or 
not the hydroxyl group on that carbon is glycosidically 
linked.” Two-dimensional spectra are used to assign the 
set of resonances from hydrogens on the same monosac- 
charide HE"? 

Mass spectrometry has also been applied to struc- 
tural studies of oligosaccharides and glycopeptides. 
Oligosaccharides and glycopeptides can be transferred 
to the gas phase as ionic molecules by fast-atom bom- 
bardment,”° electrospray,” or matrix-assisted-laser- 
desorption ionization.’ A mass spectrometer cannot 
distinguish mannose, galactose, and glucose from each 
other, nor N-acetylglucosamine from N-acetylgalac- 
tosamine. When used without collision-induced dissoci- 
ation, it can provide information only about the number 
of hexoses, N-(acetylamino)hexoses, and sialic acids 
present in a given glycopeptide” because the molecular 
mass of an oligosaccharide is the same regardless of how 
the monosaccharides are connected and which epimers 
are present. Because of the unusual molecular mass of 
fucose, oligosaccharides containing different amounts of 
fucose can be distinguished by mass spectrometry.” 
When mass spectrometry is combined with chemical 
modifications such as methylation, more information 
can be gathered by using mass spectra to analyze the 
products of the reactions.” One significant advantage of 
mass spectrometry is that mixtures of glycopeptides or 
oligosaccharides can be analyzed because each compo- 
nent in the mixture produces a different molecular ion.” 

When a tandem mass spectrometer is used with an 
intermediate step of collision-induced dissociation, the 
molecular ion of the oligosaccharide can be selected by 
the first mass spectrometer, and the fragment ions can be 
registered by the second.” In this way a clean sequence 
of fragments, each missing an additional monosaccha- 
ride, can be observed. The sequence in which the frag- 
mentation occurs can provide information about the 
sequence of monosaccharides in the oligosaccharide, but 
because the monomeric units are usually not arranged 
linearly in an oligosaccharide but as branches, the order 
in which hexoses and N-acetylhexosamines are lost upon 


fragmentation is not sufficient to define the sequence in 
which they occur in the oligosaccharide. In the case of the 
linear, unbranched oligosaccharides of proteoglycans 
(Table 3-4), however, because the masses of glucuronic 
acid and iduronic acid differ from those of N-acetylgalac- 
tosamic and N-acetylglucosamine and because the vari- 
ous postsynthetic modifications change the masses of the 
monosaccharides, spectrometry provides significant 
information about sequence.” 


Suggested Reading 


Baenziger, J.U., & Fiete, D. (1979) Structure of the complex 
oligosaccharides of fetuin, J. Biol. Chem. 254, 789-795. 


van Kuik, J.A., de Waard, P., Vliegenthart, J.F.G., Klein, A., Carnoy, 
C., Lamblin, G., & Roussel, P. (1991) Isolation and structural 
characterization of novel neutral oligosaccharide-alditols from 
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six of which possess the GlcNAc ß(1—3)[Gal OU — 4)GlcNAc 
BO — 6)]Gal BO. — 3)GalNAc-ol common structural element, 
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Problem 3-18: Complete the following reactions: 


HOH o OCH 


HH + nHlOg —~> 
H H 
OH OH 
HOH2C og OH 
H H + nHlOg — > 
H H 
OH OH 
+ nHlOg —> 
+ nHlOg —> 
HOH o OH 
H HO + nHlOg —~ 
H CH20H 
OH H 
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Problem 3-19: A polysaccharide, which is a polymer of 
glucose only, is treated in the following way: 


polyglucose + NaBH, sr 
V+CH3l —— 


H20 +W —— > 


< X 5 < 


X +NaBH, —— 


Y +(CH3CO),0 ——> 


Z is a mixture containing the following distribution of 
methylated glucitol acetates. 


N 


methylated glucitol acetate mole percent 


1,4,5-triacetyl-2,3,6-trimethylglucitol 81.9 
2,3-dimethyl-1,4,5,6-tetraacetylglucitol 9.0 
1,5-diacetyl-2,3,4,6-tetramethylglucitol 8.8 
4-acetyl-1,2,3,5,6-pentamethylglucitol 0.2 


(A) On average, how many monosaccharides does 
each molecule of the polysaccharide contain, how 
many branch points are there, and how many 
nonreducing ends are there? 


(B) Draw structures of the linkages in the main linear 
polymer and the structure of a branch point. 


(C) If the polysaccharide were treated with periodic 
acid, what percentage of the glucose would be 
destroyed? 


Problem 3-20: A glycopeptide has been isolated follow- 
ing exhaustive pronase digestion of phytohemagglutinin 
from lima beans.°*’ Determine its structure from the fol- 
lowing information. The compositions of single, intact, 
homogeneous glycopeptides or oligosaccharides are 
enclosed within parentheses. 


(A) Composition: (mannose, N-acetylglucosamine,, 
Asp) 


(B) Exhaustive methylation, acid hydrolysis, reduc- 
tion, and acetylation 


mol (mol of 


methylated acetylated 
dimethyltetraacetylmannitol)"! 


sugar alcohol 


1,5-diacetyl-2,3,4,6- 1.95 
tetramethylmannitol 

1,2,5-triacetyl-3,4,6- 1.10 
trimethylmannitol 

2,4-dimethyl-1,3,5,6- 1.00 
tetraacetylmannitol 

3,6-dimethyl-1,4,5-triacetyl- 1.90 


N-acetylglucosaminitol 


(C) Periodate oxidation followed by mild acid hydrol- 
ysis yields a smaller glycopeptide and no free 
sugar of any kind. The composition of the smaller 
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glycopeptide is (mannose, N-acetylglu- 
cosamine,, Asp) 
(D) Mannosidase treatments of initial glycopeptide 
produced 
mannosidase mol of mannose 
released (mol of 
glycopeptide) 
a-mannosidase (Arthrobacter GJM-1) 0.9 
(o1,2)-mannosidase (Aspergillus niger) 1.1 
a-mannosidase (jack bean) 3.0 
o-mannosidase (jack bean) followed by 3.8 


B-mannosidase (A. niger) 


B-mannosidase (A. niger) alone 


(E) 


(F) 


Glycopeptide core remaining after digestion with 
oa-mannosidase (jack bean) and ß-mannosidase 
(A. niger) was GlcNAc(ß1,4)GleNAc-Asn. 


The glycopeptide remaining after a-mannosidase 
(Arthrobacter GJM-1) digestion was isolated, 
exhaustively methylated, and hydrolyzed, and the 
resulting methylhexoses were reduced and acety- 


lated. 1,5,6-Triacetyl-2,3,4-trimethylmannitol, 
1,2,5-triacetyl-3,4,6-trimethylmannitol, and 
1,5-diacetyl-2,3,4,6-tetramethylmannitol were 


obtained in approximately equal amounts. 


Draw a structure for this glycopeptide that is consistent 
with all of these observations. 


Problem 3-21: A glycopeptide has been purified from 


thyroglobulin 


34 that had been digested with pronase. 


From the following information, determine its complete 
structure. Draw the linkage to the amino acid side chain 
in the peptide portion. The compositions of single, 
intact, homogeneous glycopeptides or oligosaccharides 
are enclosed within parentheses. 


(A) 


(B) 


(C) 


(D) 


(E) 


Composition 
(Asp, Gly, Val, mannose;, acetate, glucosamine.) 


Pronase + a-mannosidase 
Gly 

Val 

3 mannose 

(Asp, glucosamine,, acetate.) 


Chick oviduct extract 
(mannose;, glucosamine,, acetates) 
(Asp, Gly, Val) 


Exhaustive methylation, acid hydrolysis, reduc- 

tion, and acetylation (amounts not determined) 

3,6-dimethyl-1,4,5-triacetyl-N-acetylglucosaminitol 

1,5-diacetyl-2,3,4,6-tetramethylmannitol 

2,X-dimethyl-1,Y,5,6-tetraacetylmannitol X=3 or 4; 
Y=4or3) 


Periodate oxidation, acid hydrolysis 
(Asp, Gly, Val, mannose, glucosamine,, acetate.) 


(F) 


(G) 


Reduction followed by acid hydrolysis of oligosac- 
charide from step C 

3 mannose 

1 glucosamine 

1 glucosaminitol 

2 acetate 


a-Mannosidase treatment of oligosaccharide 
from step C 

3 mannose 
[(di-N-acetylglucosaminyl(ß1,4)glucosamine]* 
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Chapter 4 


Crystallographic Molecular Models 


To this point, it has been described how proteins are 
composed of long polymers of amino acids and how 
these polymers are posttranslationally modified by 
processes that alter the backbone of the polypeptide or 
the side chains of the amino acids or that add oligosac- 
charides to the polypeptides. All of the specific covalent 
bonds connecting all of the atoms in each of the post- 
translationally altered polypeptides composing a partic- 
ular protein can be defined by chemical analysis. The 
bond lengths and fixed bond angles of the monomers, 
amino acids and monosaccharides, and of the bonds 
coupling the monomers into polymers, amides and 
acetals, are known precisely. With these values, every 
bond length, the hybridization of every atom, and every 
fixed bond angle in each complete, posttranslationally 
modified polypeptide can be assigned unambiguously. 
From this information a long flexible molecular model of 
a particular posttranslationally modified polypeptide 
can be constructed with high precision. 

The problem with defining the complete structure 
of any polymer, polypeptides included, is the rotational 
degrees of freedom about the large number of exocyclic, 
unconjugated single bonds that are present in the poly- 
mer. In a finished polypeptide there are from hundreds 
to tens of thousands of such single bonds. In a commer- 
cial polymer, such as polystyrene, rotation about its 
many single bonds causes each molecule of the polymer, 
even though it may be covalently identical to other mol- 
ecules of the polymer in the sample, to assume a differ- 
ent three-dimensional structure, and if the polymer is in 
solution, the structure of each molecule usually changes 
constantly and randomly with time. The polypeptides in 
a protein, however, assume only one unchanging struc- 
ture, or a small number of interchanging structures, 
uniquely determined by the amino acid sequences of 
those polypeptides. Each molecule of the same protein 
assumes the same or one of a small number of three- 
dimensional structures. This structure or these few struc- 
tures are exclusively assumed because almost all of the 
exocyclic single bonds composing the backbones of 
the polymer and most of the exocyclic single bonds of the 
side chains of the amino acids are confined to particular 
dihedral angles. It is crystals of proteins that have pro- 
vided both this insight and the opportunity to observe 
molecular models representing these structures. 

The existence of a crystal of any protein permits 
certain conclusions to be drawn about that protein. As in 


organic chemistry, it can be concluded that the mole- 
cules in the crystal are all covalently identical or almost 
identical to each other. Furthermore, if a crystal exists, 
the covalently identical molecules can be present only in 
a small number of specific three-dimensional conforma- 
tions. In the case of proteins, all of the molecules in the 
crystal usually have the same structure or one of a small 
number of almost identical structures. It is now also 
known that the structure of a molecule of protein in a 
crystal is essentially identical to its only structure or one 
of its few structures when it is free in solution. When the 
crystal is submitted to X-ray crystallography, that unique 
structure can be observed. 


Maps of Electron Density! 


Suppose that one could see X-radiation. If one were to 
pick up a crystal of a purified protein and tumble it in his 
hand under a beam of X-radiation of one wavelength, it 
would glitter as does a jewel in a ray of sunlight. There 
would be, however, a peculiarity to this glitter. A jewel 
glitters because its facets reflect the sunlight as small indi- 
vidual mirrors. This means that if one follows a facet care- 
fully as the jewel turns, one would see that it is always 
reflecting the sunlight and realize that the glittering sen- 
sation only arises because the eye is at rest with respect 
to the moving reflected beam. The glitter from a crystal of 
protein, however, arises because its facets produce 
flashes, and these flashes occur only when a facet is 
aligned in one precise direction relative to the direction 
of the incident beam of X-radiation. The reason for this 
is that the flashes are produced by the summation in 
phase of the reflections from a stack of evenly spaced, par- 
allel mirrors. This summation in phase is diffraction. It is 
only at certain angles that the reflections sum in phase. 
If one played with the crystal of protein long 
enough, it would become clear that there were axes run- 
ning through it. Rotation about any one of these axes 
would produce flashes that were regularly arrayed. This 
regular array of flashes would be reminiscent of the array 
of reflections that emanates from one of the rotating mir- 
rored spheres in a ballroom. One difference, however, 
would be that while each mirror on the sphere continu- 
ously reflects the spotlight when it is on the illuminated 
side, as can be discerned by following the reflected 
beams on the walls, each of the mirrors in the crystal of 
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protein reflects only when it passes through certain pre- 
cise orientations, as could also be discerned by watching 
the patterns of the flashes on the walls. In addition, the 
mirrors in the rotating crystal, referred to as the reflect- 
ing faces, reflect onto the walls behind the crystal as well 
as in front of the crystal because the crystal is not opaque 
to X-radiation, as is the ballroom sphere to light, and 
both sides of each mirror can reflect. 

The easiest way to verify this behavior is photo- 
graphically. A crystal is mounted on the end of a rotating 
shaft the axis of which is coincident with the axis of a 
cylinder of photographic film. The crystal is attached to 
the shaft in an orientation such that one of its principal 
axes is parallel to the axis around which the shaft is rotat- 
ing. The cylinder of film has a slot through which a beam 
of X-radiation perpendicular to the axis of rotation can 
be directed upon the crystal (Figure 4—-1A).’ After an 
appropriate exposure, the film is developed. The image 
observed is that of reflected flashes arrayed on lines of 
latitude (Figure 4-1B). Each line of latitude, referred to as 
a layer line, arises from all of the mirrors that are tilted at 
the same angle with respect to the beam of incident X- 
radiation. Because the spots on the film produced by the 
flashes occur along layer lines, the tilt of the mirrors rel- 
ative to the axis of the crystal must be able to assume only 
certain values. Because the layer lines are made up of dis- 
crete spots, each the result of one flash, each mirror must 
reflect only when the angle between its face and the inci- 
dent beam assumes unique values. 


A 


rotating shaft 


M 


cylinder of film 


= 


slot in film 


source of X-rays 


The profound insight into this curious phenome- 
non was the realization that the remarkable variations in 
the intensities of the flashes (Figure 4-1B) contained 
information and that, from the information they con- 
tained, the atomic structure of the molecules from which 
the crystal was formed could be deduced. With the prom- 
ise that this is the reward, one can now ask, what are 
these mirrors, why do they flash, and why does each one 
flash with a different intensity? 

A crystal of protein is a solution to a warehousing 
problem. It is a solid object formed from a huge number 
of the same protein molecules, neatly stacked as the 
boxes or barrels in a warehouse, with the vacancies 
between the molecules of the protein filled with water. It 
is, for all intents and purposes, an infinite, three-dimen- 
sional array of identical enantiomeric objects. It can be 
shown that there are only 71 ways to arrange enan- 
tiomeric objects to form an infinite array. Each crystal 
represents a particular one of these 71 solutions. 

Each of these 71 different arrangements can be 
divided in its entirety into a stack of boxes, each of which 
is identical in its size, shape, contents, and the arrange- 
ment of its contents to every other one. These boxes are 
always parallelepipeds, and they are referred to as unit 
cells. A unit cell is the smallest parallelepiped of matter 
that, by only simple translational movements along the 
three axes of the crystal, can be stacked to create and fill 
completely the whole crystal. Keep in mind that each of 
these parallelepipeds is filled with molecules of protein, 
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Figure 4-1: (A) Schematic drawing of a camera used to take an oscillation photograph of a crystal turning about one of its crystallographic 
axes. (B) A photograph from such a camera.’ The crystal was aligned such that one axis of the unit cell was perpendicular to the beam of X- 
radiation, and the crystal was then rotated back and forth around this vertical axis back. Each of these oscillations covered the same excur- 
sion of about 20°. The axis of rotation was aligned vertically with respect to the film as it is displayed. The white shadow in the center of the 
photograph is of a beam stop used to protect the film from the majority of the X-radiation, which passes through and around the crystal. The 
beam was pointed at the circular top of the beam stop. The five layer lines are labeled as if the rotation had occurred around the a axis of the 
crystal. The middle layer line (0,k,/) is the equator. Reprinted with permission from ref 2. Copyright 1968 Macmillan. 


and surrounding water, that are necessarily arranged in 
space in certain positions and orientations. It is this 
arrangement of molecules that exists, not the unit cells or 
the planes about to be discussed. 

In any crystal, three sets of planes can be con- 
structed. Within each of these three sets, all of the con- 
stituent planes must be equidistant and parallel to each 
other. Every one of the planes in each of the three sets 
must intersect planes from both of the other two sets, 
and every one of the parallelepipeds that results from 
these intersections must be the same unit cell containing 
the same distribution of matter. The partition of space 
accomplished by three sets of planes so defined is 
accompanied by the creation of a network of lines, each 
of which is the intersection of two of these planes. This 
network of lines is a lattice (Figure 4-2) encaging a set of 
unit cells. 

Unfortunately, each crystal can be divided into 
several different lattices, each of which satisfies the defi- 
nition. In addition, any translational movement, no 
matter how small, of any one of these lattices produces 
another equally satisfactory lattice. Usually crystallogra- 
phers follow certain conventions in choosing the funda- 
mental lattice that will be used. These conventions are 
designed in part to reveal any underlying symmetry that 
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Figure 4-2: A triclinic lattice. This lattice is the most general 
because none of the sides of the unit cells (a, b, or c) is the same 
length and none of the three angles (a, ß, or y) is 90° or 120°. If 
a= B=90°, the lattice would be monoclinic. If œ = B= y= 90°, the 
lattice would be orthorhombic. If a = b and a = B = y= 90°, 
the lattice would be tetragonal. If a= b = c and a= B= y+ 90°, the 
lattice would be rhombohedral. If a = b, œ= B= 90°, and y= 120°, 
the lattice would be hexagonal. If a = b = c and a= B= y= 90°, the 
lattice would be cubic. The axes are defined by the right-hand rule. 
The hand is reprinted with permission from ref 2. Copyright 1968 
Macmillan. 
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Figure 4-3: Fundamental unit cell in a triclinic lattice showing the 
relationship between the distribution of matter and the boundaries 
of the fundamental unit cell. Even if the fundamental unit cell has 


not been chosen to enclose one of the repeating objects, it contains 
a total of one complete object. 


may be present within the crystal and in part to simplify 
the extensive calculations that are involved in producing 
a map of electron density. One lattice is chosen, however, 
during a procedure known as indexing, and this choice 
defines the fundamental unit cell (Figure 4-3). The three 
axes of the fundamental unit cell are conventionally des- 
ignated a, b, and c by the right-hand rule (Figure 4—2). 
The length of the fundamental unit cell along each axis, 
in nanometers, is designated a, b, or c, respectively. 

There are other ways to divide the space occupied 
by the crystal into different sets of unit cells by using 
other sets of parallel planes. This can be most easily seen 
by starting with a two-dimensional lattice (Figure 4—4). 
Each set of parallel lines in the figure is constructed so 
that its members pass through the origins of the funda- 
mental unit cells, and the origin of each fundamental 
unit cell is contained in one of the lines of each set. This 
two-dimensional lattice can then be thought of as one of 
the lattice planes in a three-dimensional monoclinic 
crystal. In this case, the view presented in the figure 
would be down the c axis, and each line would be the 
intersection of a plane perpendicular to the page. Each 
set of planes parallel to each other and perpendicular to 
the two-dimensional lattice would create a new array of 
unit cells (Figure 4-5), and within a particular crystal, 
every one of these arrays would contain the same 
number of unit cells. Most of these arrays do not produce 
lattices because their unit cells are not formed from three 
intersecting sets of parallel planes, but they are arrays of 
genuine unit cells nevertheless. By extension it is clear 
that there is an infinite number of ways to divide a lattice 
into a set of unit cells that are bounded by at least one set 
of parallel planes (Figure 4-5). 

Each of these sets of parallel planes is identified by 
giving it an index, (h,k,/). The index is referred to the axes 
of the fundamental unit cell. From an examination of 
Figure 4—5, it can be seen that the parallel planes per- 
pendicular to the page that define a given set of unit cells 
always intersect the axes of the fundamental unit cell at 
intervals that are the quotient of the length of the funda- 
mental unit cell along that axis (a, b, or c, respectively) 
and an integer. As the tilt of the planes relative to that axis 
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lines with indices (1,1), (2,1), and (3,1) and (-1,1), (-2,1), and (-3,1) are presented. 


increases, so does the magnitude of this integer, monot- 
onically and continuously. A given set of parallel planes 
(Figure 4-6) is assigned three integers, h, k, and l. The 
magnitude of the integer h is the number of segments 
into which the planes divide the length ofthe fundamen- 
tal unit cell along its aaxis. The magnitude of the 
integer kisthe number ofsegments into which the planes 
divide the length of the fundamental unit cell along the 
b axis; and the magnitude of the integer /, alongthec axis. 
When the set of planes is parallel to one of the axes of the 
fundamental unit cell, as all of the planes in Figure 4-5 
are to the c axis, it is assigned 0 for the respective index. 

The signs of the integers assigned to each reflection 
are determined by the relative progressions of the planes 
along the three axes. If, as one progresses from one plane 
to the next along the a axis in a positive direction, the 
intersections of the successive planes with the b axis pro- 
gresses also in a positive direction, as they do in Figure 
4-6, then the signs of the integers h and k are the same. 
If, however, as one progresses from one plane to the next 
along the a axis in a positive direction, the intersections 
of the successive planes with the b axis progress in a neg- 
ative direction, as they do in Figure 4—5, then the signs of 
the integers h and k are opposite each other. The same 
holds for the relationship between the signs of the inte- 
gers hand. 


Each plane in a set of parallel planes has two faces, 
and either can reflect X-radiation. The two reflections, 
one from each of the two sides of that set of reflecting 
planes, are a Friedel pair.’ In the indices (h,k,J) assigned 
respectively to the two reflections of the Friedel pair, the 
signs of the integers h, k and l are opposite. For example, 
the two reflections with indices (3,-2,4) and (-3,2,-4), 
respectively, are from the opposite faces of the same set 
of parallel planes and are a Friedel pair. 

The reflections from the faces of the sets of reflect- 
ing planes are produced by the electrons in the crystal 
and they are emitted by diffraction. 

Electrons scatter X-radiation, and molecules are 
clouds of electrons confined within atomic and molecu- 
lar orbitals. The molecule or molecules of protein and the 
molecules of water distributed through any unit cell in a 
crystal are clouds of electrons, and they will scatter X- 
radiation. Electrons scatter X-radiation by being excited 
to vibrate by the oscillating electric field of the incident 
beam and then radiating X-radiation of the same wave- 
length in all directions. 

Cut a crystal of protein across its entire width with 
any plane parallel to the set of planes of a given index 
(Figure 4-5). Examine one of the two smooth, flat faces 
produced by that random transection of the crystal. That 
face contains a particular amount of electron density 
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from those atoms within each unit cell that were tran- 
sected by the plane. Each unit cell defined by the set of 
planes parallel to the transection contributes exactly the 
same amount of electron density to the face because it is 
sliced at exactly the same angle and at exactly the same 
level. All of the electron density in the entire face will 
scatter X-radiation. The electron density in the face scat- 
ters the X-radiation just as the silver on the smooth, flat 
surface of a mirror scatters light, and the face is therefore 
a mirror for X-radiation. All of the quanta of X-radiation 
reflected by that mirror at a certain angle will be reflected 


Figure 4-6: Assignment of an index to a set of planes creating the 
reflecting faces. The index h, k, or l relative to a given axis, a, b, or 
c, respectively, is the number of segments into which the respective 
axis is intersected by the set of planes over the length of the funda- 
mental unit cell. The index of this set of parallel planes is (4,2,3). 
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Figure 4-5: Sets of unit cells 
created by sets of parallel 
planes. Assume this to be a 
monoclinic lattice viewed 
down the c axis and each line 
the intersection of a perpen- 
dicular plane (h,k,0) with the 
page. Each new set of parallel 
planes produces a new set of 
unit cells. The index of each 
set is given at the top. Each 
type of unit cell cuts the 
repeating object into differ- 
ent segments, but in each 
unit cell in each set of parallel 
planes, the segments, if put 
together, form one complete 
object. 


in phase as is the case with any planar mirror (Figure 
4-7). As a result, the scattering elements can be anywhere 
in the reflecting face and the regularly arrayed, repeating 
pattern of electron density can be translated along axes 
parallel to the plane, without affecting either the ampli- 
tude or the phase of the reflection. It is this insensitivity 
of reflected electromagnetic radiation to translation that 
creates the requirement that a unit cell be only a transla- 
tional repeating unit. The amplitude of the reflection 
produced by this mirror will be proportional to the quan- 
tity of electron density it contains, which is equal to the 
amount of electron density contributed by each unit cell 
times the number of unit cells it transects. 

Consider the two planes parallel to the one just 
described at a distance the width of one unit cell above it 
and at a distance the width of one unit cell below it. Each 
of these two planes creates a reflecting face with an 
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Figure 4-7: Incident electromagnetic radiation at an angle ¢ to a 
plane of reflection emerges from the reflection in phase regardless 
of the locations of the points on the plane at which reflection 
occurs. 
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identical orientation and an identical repeating pattern 
of electron density to the one just described. The three 
reflecting faces considered so far, however, are undistin- 
guished members of a set of reflecting faces evenly 
spaced throughout the entire crystal that each contain an 
identical repeating pattern of electron density, that are 
each the distance of a unit cell above and below their 
neighbors, and that together include identical transec- 
tions through all of the unit cells of the same index in the 
crystal. Each of the members in this set of faces will pro- 
duce a reflection. If the crystal is being rotated in a beam 
of X-radiation, when the angle of the incident beam of 
X-radiation assumes one of a set of particular values, Ou, 
with respect to the set of planes that produced this set of 
reflecting faces, the reflections from all of the reflecting 
faces in the set will add in phase to produce a burst or 
flash of X-radiation by diffraction (Figure 4-1B). 

The values of Ou at which this diffracted reflection 
occurs is defined by Bragg’s law” 
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where n is any integer, A is the wavelength of the incident 
X-radiation, and dau is the perpendicular distance, or 
Bragg spacing, between the reflecting faces, which is the 
width of the unit cells between the planes. Only dif- 
fracted reflections are emitted by the crystal because 
when the incident angle of the X-radiation on a set of 
faces is not equal to one of these values Ou, there are so 
many reflections from that set of faces that are out of 
phase with each other that all of them cancel completely. 
From Equation 4-1 it follows that X-radiation is dif- 
fracted by every set of reflecting faces for which the spac- 
ing between the planes is larger than 4/2. The distance 
A/2 is the diffraction limit. Sets of faces with spacings less 
than the diffraction limit do not diffract the X-radiation, 
and therefore their reflections cannot be observed. 

Now consider a plane transecting the crystal paral- 
lel to one of the reflecting faces just described, so that it 
has the same index but at a distance ds above it (Figure 
4-8). Consider the reflecting face created by this plane 
that faces the same direction as the reflecting faces just 
described. This new reflecting face is a member of a 
second set of reflecting faces each a distance du apart 
and each a distance ds above a member of the first set. 
This second set of reflecting surfaces will diffract the 
X-radiation at the same incident angle that the first set 
did because its spacing and angular disposition is the 
same. But the amplitude of the diffraction from the 
second set will be different from the amplitude of the 
diffraction from the first set because even though all of 
the unit cells are also sliced by the second set, a reflect- 
ing face in the second set transects a different region of 
the unit cell and therefore contains a different amount of 
electron density from each unit cell than did a reflecting 


face in the first set. The phase of the diffraction from the 
second set will also be different from the phase of the dif- 
fraction from the first set of reflecting faces because the 
second set is displaced a distance ds from the first. 

This process of slicing the crystal with sets of 
reflecting faces each displaced from the set before it by a 
distance ds can be repeated until the entire unit cell, and 
hence the entire crystal, has been sliced (Figure 4-8). 
Each one of these different sets of reflecting faces will dif- 
fract at the same angle, Ou, because they all have the 
spacing of the planes of the given index. The single 
amplitude and single phase of the total diffracted reflec- 
tion produced by the complete set of all of these reflect- 
ing faces will be the sum of the individual amplitudes and 
individual phases of all of the component sets. The dif- 
fracted reflection from the complete set will be observed 
at the angle Ou to the incident beam of X-radiation, and 
its amplitude and phase will necessarily contain infor- 
mation concerning the distribution of electron density 
within the crystal. Each complete set of faces of a given 
index passing through the lattice will produce its own dif- 
fracted reflection. Each of the spots on the film in Figure 
4-1 is the diffracted reflection from the complete set of 
reflecting faces of a particular index. 

The phase of the reflection from the set of faces hkl 
is designated ou, The phase of the reflection is the dis- 
tance between a crest of the emitted wave and a point of 
reference common to all of the emitted X-radiation. The 
phase is expressed in units of wavelength so that, were its 
value 1, there would be an integral number of wave- 
lengths between the point of reference and the crest. 
Because the wave is periodic, the phase is expressed as a 
dimensionless fraction between 0 and 1. Because the 


Figure 4-8: Reflecting faces within three consecutive unit cells of 
height d each a distance ds apart from its neighbors. The first 11 
reflecting faces in the unit cell in the middle are shown. The top 
three faces of that same set of 11 in the bottom unit cell are also 
shown as well as the bottom face of the same set in the top unit cell. 
Each reflecting face extends over the whole crystal but only its 
intersection with the respective unit cell is shown in the figure. 
Each reflecting face in the set represented here is parallel to two of 
the axes of the unit cell. Each reflecting face in the stack within each 
unit cell has a different two-dimensional distribution of electron 
density because each plane producing a reflecting face cuts a dif- 
ferent section through the unit cell. 


wavelength of the X-radiation used is usually less than 
0.2 nm, the differences in phase among the different 
reflections are less than 0.2 nm, and because coherent 
sources are unavailable, they are immeasurable. 

Because the choice of the lattice used to define the 
fundamental unit cell was arbitrary, the choice of the 
boundaries of the fundamental unit cell is arbitrary. 
Because every plane passing through the unit cell paral- 
lel to its boundaries (Figure 4-8) is no better than any 
other, the plane chosen as the upper boundary of the 
fundamental unit cell can be anywhere so long as the 
plane below it chosen for the lower boundary is the one 
at the level where the pattern repeats. Crystals are 
seldom cooperative and the molecules packed within 
them almost never fit as intact entities into a neat box. 
But this is irrelevant because the solution to a crystallo- 
graphic calculation gives the distribution of electrons in 
the fundamental unit cell. The distribution of electrons 
in any number of adjacent fundamental unit cells can be 
constructed by simply stacking fundamental unit cells 
next to each other. If a large enough pile of fundamental 
unit cells is made, a complete molecule will be found 
somewhere in the pile. 

Each of the thousands of diffracted reflections 
emerging from the rotating crystal must be assigned an 
index. For example, each reflection in Figure 4-1B origi- 
nated from one of the complete sets of reflecting faces, 
and the index of that set must be assigned to that reflec- 
tion. The problem of indexing is a game of mirrors. As 
with all games, it is captivating and takes on a life of its 
own. The crystallographer plays the game in reciprocal 
space; and, as such, learning to live in reciprocal space is 
a rite of passage. But it is not necessary to live in recipro- 
cal space unless one is a crystallographer; it is enough to 
know that this can be done with certainty. The concept of 
reciprocal space simplifies this process of assignment. A 
familiarity with reciprocal space also permits an engi- 
neer to design an X-ray camera that can display the 
reflections on a sheet of photographic film in the order of 
their index number. A precession photograph (Figure 
4-9) is the product of such a camera. 

The intensities of all of the reflections the values of 
h, k, and lof which fall within chosen limits are measured 
and indexed. From the measured intensity of each reflec- 
tion, the amplitude of the structure factor of that reflec- 
tion can be calculated. The structure factor of a 
reflection is a vector. The phase of that vector is the 
phase of the reflection, and the square of the amplitude 
of that vector is a number directly proportional to the 
measured intensity of the reflection. The constant of this 
proportionality can be calculated from a consideration of 
the geometry and the dimensions of the instrument used 
to measure the reflection. The amplitude of the struc- 
ture factor is the quotient of the square root of the meas- 
ured intensity of the reflection and this constant of 
proportionality. In this way, the amplitude of the struc- 
ture factor is the amplitude of the reflection that has 
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Figure 4-9: Precession photograph of the diffraction of X-radia- 
tion by a crystal of egg-white lysozyme from Gallus gallus and its 
isomorphous replacement both in the same triclinic lattice.'” This 
is a section through the full three-dimensional pattern of reflec- 
tions, taken with a Buerger precession camera. Only reflections 
with an index h of 0 were arrayed by the camera in this particular 
section. Two photographs are superimposed slightly out of hori- 
zontal register to show changes in intensities produced by isomor- 
phous introduction of heavy atoms into the crystal. Left spot of 
each pair, native lysozyme; right spot, crystal after HgBr, has dif- 
fused in. This is a photograph of an array of the Okl set of reflections 
mechanically arranged by the camera with the / axis horizontal and 
k axis nearly vertical. The index of each reflection can be assigned 
by inspection from its position in the array. One can consider this 
photograph as an array of all of the reflections in an equatorial layer 
line from a rotation photograph about the a axis, such as the one in 
Figure 4-1B, laid out by the precession camera systematically upon 
the field. The photograph contains all reflections needed to com- 
pute a projection of the structure down the a axis to Bragg spacings 
of 0.4 nm. Reprinted with permission from ref 12. Copyright 1964 
Academic Press. 


been corrected for the geometric and instrumental 
details involved in the measurement. Consequently, it is 
a property only of the unit cell itself and not of the 
method by which the measurement was made. Together 
the amplitudes of the properly indexed structure factors 
produce a data set. A data set is a three-dimensional 
matrix centered on an origin (0,0,0) in which is entered 
the amplitudes of all the structure factors calculated 
from the intensities of the respective reflections that 
have been measured from a given crystal. Each ampli- 
tude is entered at the location in the matrix that has an 
index identical to the index of its reflection. The ampli- 
tude entered at position hkl in this matrix is designated 
as Eu and is a positive number. 

The rectangular sheet of photographic film that was 
once used to record the amplitudes of the diffracted 
reflections (Figure 4-1B) has been replaced by a charge- 
coupled device, which is a rectangular plate that can 
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measure simultaneously the magnitude of the flux of 
X-radiation through each pixel on its surface. If the 
density of pixels is constant, the larger the surface area of 
the detector, the greater the number of reflections that 
can be measured simultaneously, and the less will be the 
damage to the crystal caused by the X-radiation.* 

The greater the flux of X-radiation from the source, 
the shorter will be the interval needed to collect a com- 
plete data set. The sources of X-radiation with the great- 
est fluxes are the large synchrotrons located at national 
laboratories around the world.’ The additional advan- 
tage to X-radiation from synchrotrons is that these 
sources produce a broad spectrum of wavelengths so 
that any narrow range of wavelength within this spec- 
trum can be chosen for the experiment. Ifthe crystal dif- 
fracts effectively, it is now possible to collect hundreds of 
thousands”” of unique reflections* to Bragg spacings of 
less than 0.1 nm,°® which approaches the diffraction 
limit ofthe available wavelengths. 

The size and shape of the fundamental unit cell can 
be defined from the angles at which the reflections 
emerge from the crystal and hence the spacings of the 
reflections over the surface of the detector (Figure 4-1B). 
The size and shape of the fundamental unit cell and the 
indexed data set itself are the only directly measurable 
quantities available to the crystallographer, and they are 
ultimately the information used to calculate the distribu- 
tion of electrons in the crystal of a protein. The unob- 
servable phases, the values of which are inescapably 
required for the calculation, must be ascertained indi- 
rectly by comparing several data sets, each obtained 
from an altered form of the original crystal. 

At this point it is possible to explain the pattern of 
reflections in the oscillation photograph in Figure A 18. 
The central layer line, which intersects at its midpoint the 
axis of the collimated beam of X-radiation in the camera, 
is referred to as the equator. Assume that the axis of the 
crystal chosen for rotation is the a axis and that the axis 
of the beam of X-radiation is perpendicular to the axis 
about which the crystal is rotated. Define the angle v, 
which is the angle between the axis of rotation and a dif- 
fracted reflection (Figure 4-10). The complete sets of 
reflecting faces with an index h of 0, referred to as the 
(0,k,D) sets of faces, contain only faces parallel to the axis 
of rotation. As the rotation of the crystal brings each set 
of these (0,k,J) faces into the proper angle Ou with 
respect to the beam of X-radiation, diffracted reflection 
occurs. Because each set of these faces is parallel to the 
axis about which the crystal is rotated and because the 
angle of reflection equals the angle of incidence, each 
flash emerges at an angle v=90°. The reflections from 
the (0,k,l) sets of faces are the layer line of reflections on 
the equator. There is, however, no easily explained pat- 
tern in which the reflections with particular values for k 


* Unique reflections count only those not duplicated by the sym- 
metry of the crystal and count only one of the Friedel pairs. 
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Figure 4-10: Definition of v, the angle between the axis of rotation 
and the diffracted reflection. 


and (are distributed along the equator. Now consider all 
the sets of faces with an index h of 1, referred to as the 
(1,k,) sets of faces. Each of the faces in these sets makes 
an angle with the axis of rotation such that all values of v 
for these sets are the same and all the reflections lie on 
the first layer line. The same argument can be made for 
the other layer lines. Therefore, in a crystal rotated about 
the a axis, all of the reflections with the same first index 
lie on the same layer line, and each successive layer line 
out from the equator is of successively higher or succes- 
sively lower first index. In each layer line, however, the 
pattern in which the reflections with successive second 
or successive third indices occur is complex. Each of 
these reflections, however, can be assigned a full index 
unambiguously by the crystallographer. 

Any piece of matter, at a given instant LIT" s), has 
a particular distribution of molecules, and this distribu- 
tion of molecules causes the matter to have a distribu- 
tion of electron density, p(x,y,z). If the matter is a gas or 
a liquid, the rapid redistribution of the molecules causes 
p(x y,z) to change over its full extent with time. The col- 
lection and measurement of the intensities of the reflec- 
tions necessary to perform a determination of molecular 
structure usually takes hours, and the distribution of 
electron density of a liquid or a gas averaged over the 
period of the measurement is absolutely uniform. The 
liquid regions of aqueous solvent within a crystal of pro- 
tein, which account for 40-75% of its volume,’ are, as a 
result, featureless. Any portion of the protein the position 
of which fluctuates over dimensions greater than those 
of an atom, for example a flexible segment of polypep- 
tide, is also featureless. To the extent that the matter in a 
crystal is a solid, its molecules remain fixed in space, and 
solids have well-defined distributions of electron den- 
sity. The fixed portions of the molecules of protein in the 
crystal remain in place, except for thermal vibrations, 


and the experimentally measurable distribution of elec- 
tron density in the regions of the crystal that contain the 
fixed portion of the protein is the average over these 
small vibrational displacements. Within these limits, it is 
featured. The distribution of featured electron density in 
a crystal is a periodic function, by definition, and it is this 
periodicity that leads to the reflections. 
It can be shown? that for a crystal of protein 
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where Vis the volume of the fundamental unit cell, Pa 
are all of the amplitudes, properly indexed, and ou are 
all of the properly indexed phases of the structure factors 
of the reflections. The coordinates x, y, and z in Equation 
4-2 are referred to the major axes a, b, and c, respec- 
tively, of the fundamental unit cell (Figure 4-11). This 
usually produces a coordinate system that is not orthog- 
onal. The lengths a, b, and c of the fundamental unit cell 
along the three major axes a, b, and c are expressed in 
absolute length (nanometers). The three distances x, y, 
and z that are the coordinates of a point within the fun- 
damental unit cell are measured along the three major 
axes (Figure 4-11), but the units in which these three dis- 
tances are expressed in Equation 4-2 are relative dis- 
tances along these axes, where x = Xa", y = Yb’, and 
z=Zc', and X, Y, and Z are the absolute distances 
(nanometers) along each respective axis. The integers h, 
k, and J; the coordinates of a point in the unit cell, x, y, 
and z; and the phase of each structure factor, ou, are all 
dimensionless numbers. The 2z by which they are multi- 
plied in Equation 4-2 converts these dimensionless 
numbers to units of radians. This is the purpose that 27 
serves in all of the remaining equations of this section. 
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Figure 4-11: Assignment of coordinates x, y, and z to a point 
within a fundamental unit cell. The coordinate axes for the dis- 
tances x, y, and z are the crystallographic axes a, b, and c, respec- 
tively. The distances in each direction are measured along these 
axes regardless ofthe angles between them. The numbers in paren- 
theses are the values for x, y, and z, respectively, at the respective 
corners of the unit cell. 
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The imaginary portion of Equation 4-2 is some- 
what disconcerting because the intention ofthe equation 
is to calculate a real electron density. This conundrum is 
solved by noting that 


exp (iw) = cos w + isin w (4-3) 


and that, because a complete data set is acomplete set of 
Friedel pairs, all terms in i sin w cancel in pairs.’ As a 
result 


1 
p(%%z) = y2 2 > Fy COS[27 (hx +ky+k- oul 


(4-4) 


The value of examining Equation 4-2 is that it 
demonstrates that the electron density at any point in the 
fundamental unit cell can be calculated explicitly by 
inserting the amplitudes of the structure factors, prop- 
erly indexed; the coordinates of that point; and the 
respective phases. The way that the calculation of a map 
of electron density is performed is to divide the funda- 
mental unit cell into a large number of points with coor- 
dinates x,, Yp Zr. Insertion of all available values for Eu, 
Gun, D, k, l, and the coordinates of the point xp, Yg Z, into 
Equation 4-2 produces the value of p(x,y,2) at the 
point Xp, Yp Zr All points in three-dimensional space with 
values of PX pYerZr) within certain narrow ranges are con- 
nected by contoured surfaces, or all points in sections 
through the fundamental unit cell with values of 
P(XpYgZr) within certain narrow ranges are connected by 
contour lines, and this procedure produces a map of 
electron density (Figure 4-12).° 

The summations of Equation 4-2 or Equation 4-4 
are theoretically infinite, but any finite number of struc- 
ture factors will give an approximate solution. In any 
case, the data set can never contain reflections from 
beyond the diffraction limit. The effect of summing over 
only a finite number of structure factors in Equation 4-2 
is to blur p(x,y,z). The fewer the structure factors used, 
the more blurred (x,y,z) becomes. Usually 
10,000-500,000 independent reflections are measured 
and indexed to calculate a map of electron density. The 
minimum Bragg spacing of this data set usually will be 
0.1-0.3 nm, and the individual values of h, k, and l will lie 
between about -40 and 40. 

The decision about which structure factors to 
include in the data set is based on the ability of the 
instruments to measure reflections with large angles of 
Du, the ability of the crystal to diffract from sets of faces 
with small Bragg spacings, and other technical limita- 
tions. Once the decision has been made, the data set is 
collected so that it includes the amplitudes of the struc- 
ture factors from as many as possible of the reflections 
arising from sets of faces the Bragg spacings of which is 
greater than a minimum value. The universal choice of 
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“resolution” as the word used to report this minimum 
magnitude of the Bragg spacings between the faces, the 
reflections from which have been included in the data 
set, is unfortunate. This choice suggests that the property 
being noted is the same property as the resolution 
defined in optics, which it is not. The minimum magni- 
tude of the Bragg spacings does place an upper limit on 
the quality of the final map of electron density, but it is 
not the only factor determining its quality. The difficulty 
is not collecting and indexing reflections, which ulti- 
mately determines the minimum Bragg spacing, but 
establishing accurate phases for each reflection.’ 

In practice, at a given minimum value for the Bragg 
spacing, it is the quality of the phases that defines the 
quality of the map of electron density.* It has been 
shown’ that if all of the correct amplitudes are used but 
all of the phases are set at the same arbitrary value, the 
map of electron density calculated is meaningless. If, 
however, all of the correct phases are used and all of the 
amplitudes are set at the same arbitrary value, a fairly 
accurate map of electron density can be calculated.f It is 
less misleading to refer to the data set as one “to Bragg 
spacings of” rather than as one “at a resolution of”. 

The use of Equation 4-2 or 4-4 requires that the 
phase, Ou, of each structure factor in a data set be esti- 
mated. One way such an estimate can be accomplished 
for crystals of protein is by multiple isomorphous 
replacement.” Suppose there are two crystals of protein, 
alike in almost every way and hence isomorphous. The 
only difference between them is that, at one or a few spe- 
cific locations on each of the molecules of protein in one 
of the crystals, an atom or several atoms that have a large 
number of electrons has been attached; the other crystal 
is of the unadorned protein. One of the requirements 
placed on the bound atoms is that they have high elec- 
tron density, in other words, a large number of electrons 
in a small volume. For this reason, a heavy atom such as 
xenon, iodine, mercury, gold, or uranium is usually 
chosen. Another requirement is that these heavy atoms 
occupy specific points in the fundamental unit cell. This 
requirement is fulfilled if they are bound to particular 
amino acids or clusters of amino acids on the surface of 
each molecule of protein. For example, the PtCl,” used 
to phase the structure factors from a crystal of phospho- 
carrier protein UU: happened to be chelated by two his- 
tidines that by chance were adjacent to each other on the 
surface of the protein.'' The reflections from the two 
crystals, the one containing the fixed heavy atoms and 
the one without them, will have the same values for all 
Du and hence the same geometric display of reflections, 
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*The quality of the phases is quantified by the figure of merit, 
which ranges from zero (completely unreliable) to 1.0 (perfect.) 

+ Even if, however, the phases were estimated perfectly but only 
amplitudes and phases for reflections to Bragg spacings of 0.5 nm 
were used in the calculation, the resulting map of electron density 
would not contain sufficient information to reveal the structure of 
a protein. A minimum Bragg spacing of 0.3-0.35 nm is required. 


but the amplitudes of the reflections will differ (Figure 
4-9)'* as well as the phases. 

The structure factor of a reflection from a set of 
faces with the indexhkl can be represented as a 
vector Bu, The length of the vector is the amplitude of 
the structure factor, Pan, and its direction is defined by 
the phase of the reflection, Zou radians. Because the 
computations are performed in complex space 
(Equation 4-2), complex coordinates are chosen to rep- 
resent this vector: 


Fact = Fag (COS 27 Oty + fain 2mp) = Eau EXP (271 Op yy) 


(4-5) 


The real component of the vector Pn is Fig COS Zou, 
and the imaginary component is Eu sin Zon, The 
amplitude of the vector is 


Er 2 Za 
Fakı = Fai (COS Zgoun + sin” Zzouul (4-6) 


Equation 4-2 states that the electron density is the 
Fourier transform of the structure factors. It follows that 
the structure factors must be the Fourier transforms of 
the electron density. As a result, the amplitude and phase 
of a given structure factor from a crystal can be calcu- 
lated if the distribution of atoms in a fundamental unit 
cell of that crystal is known 


Fix = > Fexpl2ri(hx; + ky; + Jl (4-7) 
j 


where Fi is the scattering factor for atom j and (xyz) is 
its position in the unit cell. The scattering factor is deter- 
mined by the number of electrons in atom j and their dis- 
tributions over their respective orbitals. Because the 
sizes of the orbitals are of the order of the wavelength of 
the X-radiation, the numerical value of the scattering 
factor for a given atom is a function of the angle Ou of the 
reflection. As Ou increases, the scattering produced by 
the electrons around an atom decreases as a result of 
interference. Values for scattering factors have been tab- 
ulated for all atoms and systematic values of 0. 

It can be seen that, since Equation 4-7 is a summa- 
tion 


Fuunp = Freya + Fakıp (4-8) 


where Bunn is the structure factor from the crystal con- 
taining the heavy atom, F),;p is the same structure factor 
from the unadorned crystal, and Fy; would be the 
structure factor from a crystal in which only the heavy 
atoms were present at the same locations they occupy in 
the existing isomorph. This summation can be presented 
geometrically (Figure 4-13A).”"? If one knows where the 
heavy atoms are located in the fundamental unit cell, the 
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vectors F),,;7, both amplitudes and phases, can be calcu- 
lated with Equation 4-7. Discovering the locations of the 
heavy atoms in a given isomorph is an art, the descrip- 
tion of which is dramatic but not germane to this discus- 
sion. Their locations are eventually determined, and 
these locations are used to calculate each of the values of 
Frkin- 

Unless the heavy atom chosen displays strong 
anomalous dispersion, at least two isomorphous crys- 
tals, each substituted with a heavy atom in a different 
way are required for a unique determination of the 
phases. The data that are available are Pur, Friary 
and F),x; 42, where the index g refers to each of the several 
isomorphous replacements from the crystals of which 
reflections have been measured. From Equations 4-5 
and 4-8, these data provide a set of simultaneous vector 
equations equal in number to the number of isomor- 
phous replacements for each structure factor. In theory, 
any two of these vector equations can be solved for the 
phase, Zon, of structure factor F,x,p; in practice, as 
many as are available are used. 

There is a geometric solution to this set of simulta- 
neous vector equations. The amplitude of the vector Fjxıp 
is known from the data set, but not its phase, Zon, 
Therefore, what is known about Pur defines a circle of 
radius F),,,p with its center at point P (Figure 4-13B). Both 
the amplitude and the phase of a given Pn are known, 
and this vector can be placed so that its head is at point P. 
Its tail defines the position, point D. of the tail of 
vector F,x;44p from the isomorphous derivative in the 
vector sum (Figure 4-13A). The phase of vector Bunn is 
unknown but its amplitude, which is known, defines a 
second circle with its center at point D. Because the vector 
sum must balance (Equation 4-8), the two points where 
the two circles intersect must represent two possibilities 
for the one actual vector sum. In theory, the correct one 
of the two possibilities can be determined by going 
through the same steps with the data from a second iso- 
morphous replacement because the phase of Pur must 
be the same in both, and only one of the two possibilities 
for Fjx,p, namely, the one defined by the actual vector 
sum, should be the same in both. 

A particularly gratifying example of this way of 
choosing the correct point defining the head of 
vector F/,;p was the definition of the phase of structure 
factor Fo,» for a crystal of hemoglobin by use of six dif- 
ferent isomorphous replacements (Figure 4-13C). All 
seven circles intersect at approximately the same point 
and define the phase ag |... This is, of course, the best 
example from the thousands of structure factors for 
hemoglobin; and, in practice, the circles almost never 
intersect in the same spot or even near the same spot. 
The phase of each Du must be estimated by taking a 
statistical average of all of the points of intersection.’ The 
uncertainty in this average value for each phase is then 
used to weight the contribution of the respective struc- 
ture factor to the summations of Equation 4-2 or 4-4. 
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Figure 4-13: Assignment of phases by isomorphous replacement. (A) The 
vector equation that must define the actual relationship between the 
three actual vectors Fp, Fy, and Fy,p. (B) The amplitudes Fp (parent) and 
Fa (derivative) define two circles.’ The centers of these two circles, P and 
D, must be at the head and tail, respectively, of vector Fy. The two points 
of intersection (o and œ) are possible locations for the head of vector Fp 
in the actual vector equation (panel A). Reprinted with permission from 
ref 7. Copyright 1977 Academic Press. (C) Seven circles, the one defined by 
Fp and the six defined by Fy HP), from six isomorphous derivatives, for the 
structure factor Fo,» from a crystal of equine hemoglobin.” The origins 
of each of the six circles for the isomorphous replacements are displaced 
from the origin of the circle for the native protein by the respective 
vector Fy, calculated from the particular distribution of heavy metals in 
the fundamental unit cell. Three of those vectors are labeled Fy,, Fy,, and 
Fp, respectively. Reprinted with permission from ref 13. Copyright 1961 


Royal Society. 


Some examples of isomorphous replacements that 
have been used in crystallographic investigations should 
make these considerations less abstract. Each isomor- 
phous replacement is a different crystal, usually obtained 
by soaking a crystal of the unmodified protein in a solu- 
tion of a compound containing the heavy atom. These 
compounds can be simple ions, such as Sm**, WO7, 
Pt(CN)?', Au(CN)3, Hg”, or Pt(NH;)3*. Such ions are 
chelated at certain specific locations in the unit cells by 
functional groups on the surface of the protein in the 
crystal. Xenon gas at high pressure also produces iso- 
morphous replacements “ by associating with hydropho- 
bic pockets within the protein. Some organomercuric 
compounds, such as ethyl mercurithiosalicylate or 
diphenylmercury, are bound at specific locations on the 
protein while other organomercuric compounds, such as 
ethylmercury chloride, mersalyl, o-mercuriphenol, or 
p-mercuribenzoate, react covalently with the thiols of cys- 
teines on the protein.!® As many as three mercuric ions, 
Hg”, can be noncovalently associated with the lone pairs 
of electrons of a cystine.” More complicated organomer- 
cury compounds such as 5-mercuride-oxyuridine 


monophosphate, 3-acetoxymercuri-4-aminobenzene- 
sulfonamide,” and ethylmercuriphosphate”’ have been 
designed as analogues of ligands specific to the protein. 
An oligonucleotide containing 5-iodouracil can be used 
to attach an iodine atom to a protein that normally binds 
the unsubstituted oligonucleotide.” 

At least two and perhaps as many as six (Figure 
4-13C) or seven isomorphous replacements are made. 
The isomorphous replacements used to obtain the 
phases for maltose binding protein were made by soak- 
ing crystals in K;PtCl, Pb(NO3)2, sodium mersalyl, 
dysprosium iodide, and ` glucosyl(a1,4)-6-iodo- 
6-deoxyglucose.” The last compound is a specific ligand 
for the binding protein. In the case of trimethylamine- 
N-oxide reductase (cytochrome c), six separate isomor- 
phous replacements were prepared with TagBrı4, 
(NH,)>OsCl,, (NH,)IrCleg, K,PtCle, sodium ethylmercu- 
rithiosalicylate, and sodium ` bis(N-methylhydan- 
toinato)gold, respectively.” In the case of 
chloramphenicol O-acetyltransferase, Sm(NO,);, 
KAu(CN),, K;PtCl,, p-mercuribenzoate, and p-iodochlo- 
ramphenicol, the last of which is a good substrate for the 


enzyme, provided useful isomorphous replacements.” 
In the case of apoferritin, however, only two isomor- 
phous replacements, made with p-mercuribenzoate and 
k,UO,F;, were used to determine the phases to Bragg 
spacings of 0.28 nm.” 

Once the positions of two or more separate sets of 
heavy metal atoms are known within the fundamental 
unit cell, the reagents can be used in pairs to generate 
additional unique isomorphous replacements. The 
advantage is that because the positions in each of the 
original isomorphous replacements are already avail- 
able, the positions in the combined isomorphous 
replacement can be readily established. Isomorphous 
replacements were made from crystals of alcohol dehy- 
drogenase with K,Pt(CN), and KAu(CN),, and the posi- 
tions of the platinum and gold, respectively, in the 
resulting fundamental unit cells were determined. In 
combination, these two anions produced a third isomor- 
phous replacement.”° From crystals of deoxyribonucle- 
ase I, it was possible to make three isomorphous 
replacements, one each with TbCl;,, K,PtCl, and 
Pb(NO;),, which could then be used in the three possible 
combinations to generate three additional, unique iso- 
morphous replacements.® 

Today, however, phases are usually estimated by 
taking advantage of the anomalous dispersion of the 
heavy atoms in only one isomorphous derivative.” 
The real and imaginary components of the scattering fac- 
tors ff (Equation 4-7) for atoms such as copper,” sele- 
nium,” holmium,*! terbium,” tantalum,” uranium, 
platinum,” and bromine”° change with the wavelength 
of the X-radiation in the vicinity of their respective 
absorption edges. The changes are dramatic enough that 
if data sets are gathered at three or four different wave- 
lengths properly chosen with respect to the absorption 
edge of the heavy atom, those data sets can be equiva- 
lent, in terms of the differences produced in the intensi- 
ties of the reflections, to sets of reflections measured 
from three or four isomorphous replacements. The 
advantage of this procedure is that the same crystal con- 
taining the heavy atoms is used for all of the measure- 
ments, so that the errors associated with combining data 
from different crystals are avoided. 

The appropriate heavy atoms are usually incorpo- 
rated into the crystal by soaking. In the case of basic blue 
copper protein, however, only the copper ion already 
within the native protein was used as the heavy atom, 
and data sets gathered at four different wavelengths were 
sufficient to establish experimental phases to Bragg 
spacings of 0.25 nm with no isomorphous replacement 
at all.” It is also possible to take advantage of the anom- 
alous dispersion of one isomorphous derivative in com- 
bination with the normal diffraction from several others 
to establish experimental phases.” A common way of 
introducing a heavy atom susceptible to anomalous dis- 
persion” is to express the protein to be crystallized in a 
bacterium auxotrophic for methionine growing on a 
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medium containing selenomethionine rather than 
methionine. The selenium atoms end up at each position 
in the sequence of the protein normally occupied by 
methionine and are positioned at precise locations in the 
unit cell by the tertiary structure of the protein. 

There is an additional component of a crystal of 
protein that is formally equivalent to a set of heavy atoms 
in an isomorphous derivative. This component is the fea- 
tureless aqueous solvent that surrounds the protein. The 
fact that it should be featureless allows it to be used, 
much as an electron-rich atom is used, to improve the 
phases by solvent flattening.” A map of electron density 
is prepared with the available estimates of the phase for 
each structure factor gathered from isomorphous 
replacement and anomalous dispersion. If the map is 
clear enough that the boundary between protein and sol- 
vent can be defined (Figure 4-12), all of the region of the 
fundamental unit cell occupied by solvent is forced to 
have the same uniform electron density even though in 
this original map it was not uniform in density (notice 
the noise in the regions of the map occupied by solvent 
in Figure 4-12). From this geometric solid of uniform 
electron density, a set of structure factors equivalent to 
an additional Bun for Equation 4-8 could be calculated 
with Equation 4-7 and used as an additional constraint 
on the phases (Figure 4-13), but solvent flattening is 
more successful if used iteratively. 

The updated map of electron density with the 
vaguely defined features of the protein and the solvent 
that has been purposely flattened is used in its entirety to 
calculate a set of phases. These calculated phases are 
used in combination with the available estimates of the 
phase from isomorphous replacement to arrive at a set of 
improved phases. These improved phases and the 
observed amplitudes are used to calculate a new map of 
electron density. The regions of solvent in the new map 
are defined, the electron density in these regions is again 
forced to be uniform, and the process is repeated. As the 
iterations progress, the solvent in each new map 
becomes flatter and the protein more detailed. In 
theory” and in practice,” the method can provide ade- 
quate phases in the absence of measurements of anom- 
alous dispersion when only one isomorphous heavy 
atom derivative is available. Usually, however, solvent 
flattening is used to improve the phases that have been 
gathered with multiple isomorphous replacements or by 
anomalous dispersion. 

Because the quality of the final map of electron 
density (Figure 4-12) depends so heavily on the quality of 
the phases, the uninvolved observer can evaluate the 
results only if she is informed. It is important to learn 
how many isomorphous replacements were made, how 
many wavelengths were used for anomalous dispersion, 
and which data sets were used to calculate the phases. It 
is also essential to see at least a portion of the calculated, 
unrefined map of electron density (Figure 4-12) to get a 
feeling for its quality.“ It must be emphasized that, 
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unless a map of electron density is already available for a 
closely related protein, the calculation of the initial map 
of electron density from the phases derived from iso- 
morphous replacement is an unavoidable step in crystal- 
lography, and the quality of this map can affect 
significantly the remainder of the process. The work 
involved in obtaining this initial map is extensive, and 
many crystallographic experiments are designed specifi- 
cally to avoid this work. 

When the map of electron density within the fun- 
damental unit cell or from several neighboring funda- 
mental unit cells is examined, the electron density that 
corresponds to the intact molecule of protein can be dis- 
cerned. Since a large fraction of the crystal is liquid water, 
which is featureless, the protein, which is fixed and 
highly featured, stands out (Figure 4-12). The compact 
globule of electron density eventually assigned to an 
individual molecule of protein usually has an overall size 
and shape consistent with its amino acid sequence, its 
frictional coefficient, and other molecular parameters. 
Within this globular solid, features can be seen in rela- 
tively sharp detail, but only seldom at atomic resolution. 


Suggested Reading 


Stout, G.H., & Jensen, L.H. (1989) X-ray Structure Determination, A 
Practical Approach, 2nd ed., Wiley, New York. 


Problem 4-1: Below there is a generic unit cell. Make 
several xerographic copies of it. Draw a right-handed set 
of axes labeled a, b, and c next to each of your copies of 
the unit cell. Draw a diagram of the (4,2,3) set of reflect- 
ing planes passing through the first unit cell as in Figure 
4-6. Draw a diagram of the (4,-2,3) set of reflecting 
planes passing through the second unit cell. Draw a dia- 
gram of the (4,2,-3) set of reflecting planes passing 
through the third unit cell. Label each of your diagrams 
with the index number. 


Problem 4-2: The amplitude of a particular structure 
factor from a crystal of protein, Fp, is 22.2. The amplitude 
of the structure factor with the same index from a crystal 
of the first isomorphous replacement, F}, is 24.2. The 
structure factor with the same index calculated from the 
established positions of the heavy metal ions in the unit 
cell of the first isomorphous replacement has an ampli- 
tude Fy, of 5.4 and a phase of 110°. The amplitude of the 
structure factor with the same index from a crystal of the 
second isomorphous replacement F, is 21.0. The struc- 
ture factor with the same index calculated from the 
established positions of the heavy metal ions in the unit 
cell of the second isomorphous replacement has an 
amplitude Fy of 8.9 and a phase of 65°. Estimate graph- 
ically the phase for the structure factor of this index from 
the crystal of protein alone. 


Problem 4-3: Pig heart citrate (si) synthase crystallizes 
from solution at pH 7.4. The crystals are tetragonal. 
The dimensions of the fundamental unit cell are 
a=b=7.74nm and c = 19.64nm."' A crystal was sub- 
mitted to diffraction with X-radiation generated from 
a rotating anode of copper. The Ka emission of the 
copper (A = 0.154nm) was selected for the source of 
the X-radiation. On graph paper draw a view of the 
fundamental unit cell looking down the c axis with the set 
of (2,-4,0) faces intersecting it. At what angle @ to the 
incident beam of X-radiation will the reflection from 
the (2,-4,0) set of faces emerge from the crystal? 


The Molecular Model 


An irregular tube of electron density can be observed to 
meander through and account for the globule of featured 
electron density assigned to the intact molecule of pro- 
tein in the map of electron density. Sections of one such 
continuous tube can be seen embedded in the flat slice 


of electron density presented in Figure 4-12. This tube is 
the polypeptide of the protein (2-8) that has folded to 
assume the native structure of the molecule. It is into this 
tube that a molecular model of the known covalent struc- 
ture of the polypeptide must be fit. 

Once the covalent sequences of the polypeptides, 
the covalent sequences and points of attachment of any 
covalently bound oligosaccharides, and the identity and 
points of attachment of any other posttranslational mod- 
ifications have been established and even before a map 
of electron density is available, it is possible to construct 
amolecular model of the fully modified and glycosylated 
polypeptide known to constitute a molecule of protein. 
Such a model would incorporate bond lengths and bond 
angles that have been measured with high precision 
during crystallographic studies of small molecules. These 
small molecules used as standards are molecules the 
covalent structures of which are identical to segments of 
polypeptide, the side chains of the amino acids, seg- 
ments of oligosaccharide, or the monosaccharides in the 
oligosaccharide. As with any molecular model of such a 
size and complexity, the one of a polypeptide would be a 
flexible, protean object that assumes a new shape each 
time rotation around one of its acyclic single bonds 
occurs. 

It is this long, flexible model that must be fit, amino 
acid by amino acid, into the map of electron density. 
Until recently, the process of fitting the model into the 
map was always performed visually by the crystallogra- 
pher.” It is now possible,” however, for a computer to 
fit the model into the map automatically. Nevertheless, 
the success of this automated process for a particular 
map of electron density still must be carefully evaluated 
by the crystallographer,“’ and the fit must be altered 
accordingly by manual adjustments. To determine 
whether or not the molecular model has been correctly 
fit into the map of electron density, there are no auto- 
mated rules that are as reliable as the judgment and 
accumulated knowledge of the crystallographer. If care- 
ful human evaluation of each fit is not performed rou- 
tinely, there is a risk that the frequency at which incorrect 
crystallographic molecular models are published will 
increase as more and more crystallographic molecular 
models are produced in an automated fashion. 

One criterion that the molecular model of the 
polypeptide has been successfully fit into the map of 
electron density is the correspondence between the 
sequence in which amino acids of different sizes (Figure 
4-14) are known to occur along the amino acid sequence 
of the polypeptide and the sequence in which protru- 
sions of different size occur at regular intervals along the 
tube of electron density (Figure 4-15). The 20 different 
amino acids are, in order of increasing electron density 
(Figure 4-14), glycine, alanine, serine, proline, cysteine, 
valine, threonine, aspartate, asparagine, leucine, 
isoleucine, glutamate, glutamine, methionine, lysine, 
histidine, phenylalanine, arginine, tyrosine, and trypto- 
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phan. In terms of electron density, many of them are 
indistinguishable, for example, valine and threonine or 
aspartate, asparagine, leucine, and isoleucine, and only a 
few of them, for example, tryptophan, tyrosine, and 
phenylalanine, are of sufficient size and peculiar enough 
shape to be identified unambiguously with the protru- 
sions jutting out from the continuous tube in the map of 
electron density (Figure 4-15).* Together, however, the 
sequence in which the amino acids are arranged in a 
given protein and their relative sizes usually provide suf- 
ficient reassurance that the molecular model of the 
polypeptide has been fit into the map correctly. 

An additional reassurance that the polypeptide has 
been properly fit into the map can be obtained from 
anomalous dispersion. If the protein has been expressed 
so that it contains selenomethionine instead of methio- 
nine, the electron density at the locations in the map that 
are occupied by the selenium atoms will vary in intensity 
when the wavelength of X-radiation used to produce the 
reflections is varied near the absorption edge of the sele- 
nium. These variations in intensity can be used to locate 
the positions at which the methionines must end up after 
the molecular model has been fit properly into the tube 
of electron density.” Although the anomalous dispersion 
from sulfur itself is weak, under appropriate circum- 
stances the positions of both the cysteines and the 
methionines in the map of electron density of a molecule 
of protein expressed normally can be located by the 
anomalous dispersion of their sulfurs.”° 

The reassurance provided by the agreement 
between the known amino acid sequence of the polypep- 
tide and the sequence of the sizes of the protrusions along 
the continuous tube of electron density or the positions 
of atoms capable of anomalous dispersion is not incon- 
sequential. It is rarely the case that the tube of electron 
density representing the polypeptide in the map of elec- 
tron density is continuous over its entire length. Portions 
of the polypeptide that are so flexible that they vibrate too 
widely will not contribute to the diffraction and will pro- 
duce no structured electron density. Often segments of 
the polypeptide can assume several different conforma- 
tions while in the crystal. The movement among these 
conformations within a particular molecule of the pro- 
tein can be rapid or a particular conformation can be stat- 
ically occupied. Regardless of the rate at which the 
conformations interconvert, if at any given instant these 
segments from different molecules of the protein in the 
crystal assume different conformations, this disorder will 
prevent them from contributing to the diffraction and 
hence to the structured electron density. Occasionally 


* Although the side chain of cysteine has the same number of elec- 
trons as those of threonine and valine and the side chain of methio- 
nine the same number as those of glutamate, glutamine, and lysine 
(Figure 4-14), the sulfurs in cysteine and methionine, because they 
have 10 core electrons, produce strong localized features of elec- 
tron density (Figure 4-15) that permit them to be distinguished 
from the others. 
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Figure 4-14: Silhouettes of the side chains of the amino acids. Space-filling models of the amino acids were 
constructed with the program Chem 3D Plus. Each of the models, except those of the aromatic amino acids, 
was then rotated to produce the largest silhouette of its side chain while the bond between the p carbon and 
the a carbon was kept vertical and in the plane of the page. For each of the aromatic side chains, a view was 
chosen in which the plane of the ring was in the plane of the page so that the silhouette was as large as pos- 
sible. In this way, each of the two-dimensional silhouettes represents the relative three-dimensional bulk of 
each side chain. To produce the silhouettes, the hydrogens were erased from the models; the a carbon, the 
carboxy group, and the amino group were deleted; and all of the remaining atoms were turned black. In all 
of the silhouettes, except those of the aromatic side chains, the a carbon would occupy the position of the 
label. In each of the silhouettes of the side chains of the aromatic amino acids, the p carbon is directly above 
the label. The number of electrons in each side chain is indicated in parentheses. The standard crystallo- 
graphic code for each atom in each side chain is indicated. A silhouette of phenylalanine viewed edge on is 
also included. 


such unstructured electron density fills a large void in the 
map representing a significant portion (100-200 aa) of 
the molecule of protein.“ Usually, however, it is short 
segments of the tube that are blurred or missing, break- 
ing its continuity. Another source of ambiguity is when 
one segment of the polypeptide crosses another segment 
too closely to follow each tube confidently through the 
intersection. Missing segments of electron density and 
ambiguous crossings have led to serious errors in the fit- 
ting of polypeptides into maps of electron density, "DÄ 
and these errors are often corrected by paying close 
attention to the patterns of the protrusions along the tube 
and correlating them with the sequence.” 

Such errors in tracing the polypeptide occur fre- 
quently enough that any crystallographic molecular 
model should be considered provisional until it has been 
shown to agree with other independent observations. 
This is an important point because crystallography has 
often assumed the mantle of infallibility. Even incorrect 
crystallographic molecular models look completely con- 
vincing. It should be pointed out, however, that they are 
convincing not because of the way the polypeptide is 
folded but because they are constructed with covalent 
bonds of the correct lengths and angles. These latter fea- 
tures are not indicative of the reliability of the crystallo- 
graphic molecular model because they were 
incorporated into it automatically. 

After the polypeptide has been fit into the map of 
electron density, certain regular patterns, collectively 
referred to as secondary structure, can be seen in the 
arrangement of the polyamide backbone. The regular 
patterns that are seen in the crystallographic molecular 
models of proteins are «&helices, Bstructures, and 
p turns. o Helices and £ structures were first observed in 
hypothetical models built by Pauling and his collabora- 
tors (Figure 4-16).°! After the models had been built, it 
was found that certain of their dimensions were consis- 
tent with molecular dimensions that had been observed 
in patterns of X-ray diffraction from oriented fibers of 
protein such as hair and silk,°”” but it was not until 
much later that these structures were actually observed 
in maps of electron density of proteins. Several strands of 
p structure (Figure 4-15) are often joined together to 
form pleated sheets (Figure 4-16). Such sheets can be 
formed from strands all running parallel to each other or 
alternating in their orientation and thus each antiparal- 
lel to its neighbors or from a mixture of these two 
arrangements. 8 Turns (Figure 4-16D) were first noticed 
by Venkatachalam in the crystallographic molecular 
models of a cyclic hexapeptide, a short tetrapeptide, and 
the protein lysozyme.” 

Aside from the lengths of the covalent bonds of the 
polyamide backbone, the main structural element 
responsible for these secondary structures is the hydro- 
gen bond, which is a noncovalent interaction that forms 
between the dipole of the nitrogen-hydrogen bond of 
one amide and one or two of the lone pairs of electrons 


Figure 4-15: Fitting the molecular model of the polypeptide 


into the map of electron densi 
for galactose oxidase.'*’ The u 


ty (Bragg spacing > 0.21 nm) 
nrefined map of electron den- 


sity shown was calculated with phases that were estimated 
by use of isomorphous derivatives obtained with K;PtCl,, 
H,IrCl,, and Pb(NO;), and then improved by iterative solvent 
flattening. The figure shows skeletal models of the segments 


protein fit into the three vertical tubes of electron density. 
These are three strands of an antiparallel p sheet. The large 


-VLMW-and-TSSWDPSTGIVSDR-from the sequence of the 
protrusions for the two tryptophans, the methionine, and the 
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phenylalanine, which intrudes into the image from the left, 


are significant features along the course of the tube repre- 


senting the polypeptide that confirm the fit. The isoleucine, 


valines, aspartate, and leucine are less dramatic protrusions. 


Thecontortion ofthe polypeptide at the proline is yet another 


indication that the sequence has been matched properly with 


the map of electron density. 
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Figure 4-16: Four types of secondary structure found in molecules of protein: (A) o helix,” (B) parallel ß sheet,” (C) antiparallel ß sheet,” 
and (D) two types of D turn (type I and type ID.” The polyamide backbone can be traced by the pattern ...N, Ca, CO, N, Ca, CO, N, Ca, CO... 
and the side chains are the groups protruding (marked A,04, or R; respectively). Side views of the ß sheets are shown to the right of each over- 
head view to demonstrate the pleats. Reprinted with permission from refs 51-53. Copyright 1951 National Academy of Sciences and 1981 
Academic Press. 


on the acyl oxygen of another amide of the backbone of 
the polypeptide. These hydrogen bonds are indicated in 
Figure 4-16 by dashed lines. A hydrogen bond connects 
the acyl oxygen contributed to the polyamide backbone 
by each amino acid in a sequence coiled into an o helix 
with the amido nitrogen-hydrogen contributed to the 
backbone by the amino acid four positions farther along 
(Figure 4-16A). In pleated sheets of parallel 8 structure 
(Figure 4-16B), the amido nitrogen-hydrogen and the 
acyl oxygen contributed by an amino acid in one of 
the polypeptides are connected by hydrogen bonds to the 
acyl oxygen and amido nitrogen-hydrogen, respectively, 
of amino acids two positions apart from each other in the 
sequence of a neighboring polypeptide to form a ring 
containing 12 atoms. In pleated sheets of antiparallel 
p structure (Figure 4-16C), hydrogen-bonded rings of 14 
atoms and 10 atoms alternate along a ladderlike struc- 
ture. The only structural element that defines the confor- 
mation of a Drum (Figure 4-16D) is a hydrogen bond 
between the acyl oxygen of the first amino acid in the 
turn and the amido nitrogen-hydrogen of the fourth and 
last amino acid in the turn. 

Secondary structures enforce particular geometries 
on the conformation of the polypeptide. The f turn 
causes the polypeptide to double back on itself, often to 
form a hairpin the two tines of which are cross-connected 
in antiparallel $ structure. An o helix has a right-handed 
pitch,* and the absolute stereochemistry of the L-amino 
acids causes each side chain, the R groups in Figure 
4-16A, to cant toward its amino terminus. The side chains 
protrude from the helical core at intervals of about 100°. 
p Structure is pleated when viewed from the side (Figure 
4-16B,C) owing to unavoidable steric requirements 
resulting from the angles of the covalent bonds along the 
polypeptide. In pleated sheets of structure, the side 
chains of the amino acids in the sequence of each strand 
alternately protrude to one side and then the other of the 
surface in which the strands of polypeptide lie. 

When the molecular model of the polypeptide has 
been fit into the tube of electron density, the final struc- 
ture of its conformation represents, within the accuracy 
of the map of electron density, a skeleton of the actual 
molecule of protein. This crystallographic molecular 
model is the product of fitting atomically accurate 
molecular models of known covalent structures into a 
map of electron density.t The resulting arrangement of 


* Put the four fingers of your right hand together, bent inward and 
horizontal, and put your thumb up. As you slide your fingers 
around a right-handed helix in the direction in which they are 
pointed, the helix rises in the direction in which your thumb is 
pointed. As you slide the fingers of your left hand around a left- 
handed helix, the helix rises in the direction of the thumb. 

+ To view a crystallographic molecular model on your own com- 
puter, find the file of the coordinates for the model in which you are 
interested at http://betastaging.rcsb.org/pdb/Welcome.do and 
download the file as text. Open the file with the program 
SwissPdbViewer, which can be obtained free of charge from 
us.expasy.org/spdbv/. 
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the segments of secondary structure in three dimensions 
produces a representation of the tertiary structure of the 
folded polypeptide. The tertiary structure of a protein is 
the complete conformation into which its polypeptide is 
folded in its native form. 

An example of a crystallographic molecular model 
is the one constructed for the protein penicillopepsin 
(Figure 4-17).°° To obtain a full understanding of this 
molecular model, it must be viewed stereoscopically. The 
five panels of Figure 4-17 show drawings of the same view 
of the model. In Figure 4-17A, all of the atoms in the crys- 
tallographic molecular model are displayed in skeletal 
representation; in Figure 4-17B, the side chains of the 
amino acids have been removed to focus attention on 
only the polyamide backbone of the polypeptide and its 
hydrogen bonds, which are indicated by dashed lines; and 
in Figure 4-17C, only the a carbons of each amino acid 
are displayed, each connected to its two immediate 
neighbors in the amino acid sequence by line segments 
to create an a-carbon diagram. In all of the panels, the 
amino terminus is on the upper right at about 10 o’clock 
and the carboxy terminus is to the back at about 8 o’clock. 
You should follow the polypeptide [ ... N, Ca, CO, N, Ca, 
CO, N, Ca, CO ... ] through the whole drawing in Figure 
4-17B. Note the «helices, p structures, and $ turns. 
Compare what you see to the drawings presented by 
Pauling (Figure 4-16A-C). Note that œ helices are rigid 
tubes while ß structures are sinuous and flexible. Note the 
pleats in the Bsheets. Distinguish between sheets of 
ß structure formed from three or more strands and rib- 
bons of $ structure formed from only two strands. Now 
follow the polypeptide through the crystallographic 
molecular model in Figure 4-17A. Note the disposition of 
the side chains along secondary structures, and try to 
identify some of the amino acids. 

The tertiary structure observed in a crystallographic 
molecular model is often presented diagramatically 
(Figure 4-17D) in a cartoon where flat arrows are used to 
represent strands of polypeptide in H structure, with the 
head of the arrow at the carboxy terminus of the strand 
to provide the direction in which the chain is oriented, 
and cylinders are used to represent o helices. The tertiary 
structure of penicillopepsin, which you have explored in 
detail in Figure 4-17A, is represented, in the same orien- 
tation, by the diagram in Figure 4-17D. Follow the 
polypeptide through Figure 4-17, panels B and D, simul- 
taneously. 

The first three of the representations of the struc- 
ture of a protein molecule that have been presented so 
far are skeletons of the crystallographic molecular 
model. The advantage of the skeletons is that the whole 
molecule can be examined simultaneously even in its 
interior. As with all molecules, flesh resides upon the 
bones in the form of the electron clouds that produced 
the map of electron density in the first place. It is possi- 
ble to construct a model of a molecule of protein from 
space-filling units of the kind developed by Pauling and 
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Figure 4-17: Crystallographic molecular model of penicillopepsin, from the mold Penicillium anthinellum. In the first skeletal drawing (A), 
both the peptide backbone (heavy line segments) and the side chains (light line segments) of the amino acids are displayed, and no poten- 
tial hydrogen bonds are indicated. This drawing was produced with MolScript.'** In the second skeletal drawing (B), the side chains are left 
out and the crystallographer has assigned subjectively the locations of hydrogen bonds (dashed lines). Every tenth amino acid is identified 
and numbered to assist you in tracing the chain. Reprinted with permission from ref 56. Copyright 1983 Academic Press. In the a-carbon dia- 
gram (C), the positions of the a carbons of the amino acids in the crystallographic molecular model are designated by points and the points 
are joined by line segments. This a-carbon diagram often gives a clearer picture of the patterns of secondary and tertiary structure. This draw- 
ing was produced with MolScript.'”’ In the cartoon (D), the skeletal drawing of panel B is represented diagramatically. In a space-filling rep- 
resentation (E), each atom in the crystallographic molecular model is represented by a sphere with its van der Waals radius. This drawing 
was produced with MolScript.’” As in the stereo image of Figure 3-9, black spheres are carbon atoms; gray, nitrogen; and white, oxygen. 
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Corey (CPK models) after the coordinates of the individ- 
ual atoms in three dimensions have been gathered from 
the skeletal model. A three-dimensional photograph of 
such a model of the protein lysozyme can be seen in 
Volume 243 of the Journal of Biological Chemistry.” This 
photograph and a similar drawing found in Volume 32 of 
Biochemistry produce a reliable mental image of the 
molecular structure of a properly folded polypeptide. A 


stereoimage of a space-filling representation of penicil- 
lopepsin is presented in Figure 4-17E in the same orien- 
tation as the other drawings of the crystallographic 
molecular model of the protein. 

The space-filling representation in Figure 4-17E 
emphasizes the tight packing of the atoms of the protein 
in its tertiary structure. Penicillopepsin is an example of 
a globular protein. A globular protein is a protein in 
which the entire polypeptide is folded into a compact 
structure the three dimensions of which are of the same 
order of magnitude. There are also many proteins in 
which globular units are held together by flexible seg- 
ments of polypeptide as well as fibrous proteins in which 
the folded polypeptide forms a structure severely elon- 
gated in one of its dimensions. 

Penicillopepsin (Figure 4-17) contains mostly 
antiparallel D structure where adjoining strands are often 
connected at one end by ßturns. Myoglobin (Figure 
4-18)°° is an example of a protein that is almost entirely 
a helical. Most proteins are mixtures of these two major 
types of secondary structure, ß turns, and a certain 
amount of random meander. 

If one assumes for the moment that a particular 
crystallographic molecular model has been constructed 
correctly and represents, to the level of precision of the 
initial map of electron density, the molecules of protein 
as they are packed in the crystal, what relationship does 
this structure have to the molecules of protein when they 
are in solution in the cytoplasm? 

The existence of the diffracted reflections requires 
that every fundamental unit cell in the crystal contain the 
same distribution of structured matter. If the fundamen- 
tal unit cell contains only one molecule of protein, then 
every molecule of protein in the crystal must have exactly 
the same structure. It has always been observed, however, 
that in fundamental unit cells containing several mole- 
cules of the same protein, the several maps of electron 
density for the several molecules are quite similar and 
differ from each other only at the surfaces of the mole- 
cules where differences in crystal packing have caused 
flexible side chains or short, flexible loops of backbone to 
assume somewhat different orientations. For example, 
most of the polypeptide in the four asymmetrically 
arrayed molecules of adenylosuccinate synthase in the 
fundamental unit cell coincides to within 0.03 nm when 
the four separate crystallographic molecular models are 
superposed. This coincidence is well within the limits of 
the accuracy of the model, but four loops of 5-9 amino 
acids on the surface of the protein deviate in their posi- 
tions among the four different crystallographic molecu- 
lar models by0.15-0.5 nm because of differences in crystal 
packing.” From the fact that only differences suchas these 
are usually observed, it follows that all of the structured 
regions of the molecules of protein within a given crystal 
usually have essentially the same conformation. 

Over the inflexible portion of a globular protein or 
within each of the globular regions of a protein in which 


globular structures are held together by flexible unstruc- 
tured segments of polypeptide, there is little doubt that 
the one structure present in the crystal is the same as the 
unique structure, or is the same as one of a limited 
number of unique structures, assumed by the protein in 
free solution and therefore its native structure. First, 
unlike the usual anhydrous crystals of small molecules, a 
crystal of protein is 40-70% water.’ This water usually 
surrounds each molecule of protein almost entirely, and 
the contacts between molecules of protein in the crystal 
are adventitious and not extensive.”’ Consequently, the 
molecule of protein is still dissolved in the same aqueous 
solution from which it crystallized. Second, there are 
many instances in which the same protein has been crys- 
tallized under two or more different conditions and was 
found to be incorporated into the two or more different 
fundamental unit cells with completely different orien- 
tations, yet the respective maps of electron density were 
almost indistinguishable from each other and could be 
superposed.’ For example, the polypeptides in the two 
crystallographic molecular models of subtilisin from 
Bacillus alcalophilus produced from the two nonisomor- 
phous crystal types coincided to within less than 0.1 nm 
except at two short surface loops.” T4 Lysozyme has 
been crystallized in 25 different nonisomorphous forms 
and crystallographic molecular models have been pre- 
pared from all of them. This molecule contains two inde- 
pendently folded globular portions connected by a 
flexible segment of polypeptide, and the angle between 
these two portions can vary by up to 45° over the differ- 
ent crystals, but within each of the two portions the con- 
formation into which the polypeptide is folded is always 
the same.® Third, crystals of a protein usually retain its 
enzymatic activity,’ albeit sometimes at a lower rate, 
and this also indicates that the structure of the protein 
has not changed during its crystallization. In fact, when 
crystals of protein are suspended in an organic solvent 
that is sufficiently immiscible with water that the crystals 
retain all their water of crystallization and their interior 
remains a separate aqueous phase, the protein is unaf- 
fected and the crystals retain their enzymatic activity.™ 
Fourth, Raman spectroscopy can be performed on solids 
as well as liquids, and when the Raman spectrum of 
ribonuclease in solution was compared to its Raman 
spectrum in the crystal, the two were virtually identical in 
the region of the amide III vibrations, a region that would 
be sensitive to any changes in the structure of the 
polypeptide chain that might have occurred during crys- 
tallization.® Finally, molecular models from a number of 
proteins in solution have been obtained by nuclear mag- 
netic resonance spectroscopy, and at their level of accu- 
racy, they are indistinguishable from the crystallographic 
molecular models of the same proteins.” 

There are proteins that have been shown to be 
able to assume two stable structures in rapid equilib- 
rium with each other in solution, and in some cases, 
two different crystals can be made, each exclusively 
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incorporating one of these respective structures. The 
crystallization and elucidation of the structures of 
deoxyhemoglobin and oxyhemoglobin provide an 
example. When such crystals are exposed to a ligand 
that binds to the protein they contain and coinciden- 
tally elicits the change in the structure of the protein, 
the crystals will often shatter® as the protein assumes 
the new structure, which is incompatible with the 
former crystal lattice. In addition to presenting another 
observation consistent with the conclusions that the 
molecules of protein in the crystal retain the potentiali- 
ties that they assume in solution, this observation sug- 
gests why some crystals are not enzymatically active. If 
expression of enzymatic activity requires that the pro- 
tein change its shape slightly and reversibly each time it 
catalyzes the reaction and that change in shape is steri- 
cally hindered by the lattice of the crystal, the protein 
would not be able to display activity. 

Crystals of citrate (si) synthase provide an example 
of such a situation.) This protein can be crystallized 
under different sets of conditions that yield two different 
types of crystals containing two different conforma- 
tions of the protein. From a careful examination of the 
maps of electron density for these two conformations, it 
became clear that each time the enzyme in free solution 
converts acetyl-SCoA and oxaloacetate into citrate and 
coenzyme A, it passes back and forth between these two 
conformations. Neither crystal is enzymatically active, 
but upon dissolving either, full activity is restored. The 
conclusion drawn was that the packing of the molecules 
of protein in the crystal sterically prevented the move- 
ment between the two conformations necessary for 
enzymatic activity, not that either crystallographic 
molecular model was unrepresentative of the enzyme. 

The most compelling argument for the identity of 
the structure seen in the crystal and the structure 
assumed by the protein in solution is that the structure 
seen in the crystal makes sense. Over the more than three 
decades that crystallographic molecular models of high 
accuracy have been available for examination, what has 
been seen has consistently provided reasonable explana- 
tions for the behavior of the respective proteins in solu- 
tion. These explanations have stimulated experiments to 
test those explanations that have usually yielded inform- 
ative results. Often an experiment will rule out a hypoth- 
esis based on an examination of the structure, but the 
more informed reexamination of the structure that then 
occurs usually turns up the original error of judgment. 
The fact that a crystallographic molecular model makes 
sense is an unambiguous verification that it represents 
the actual structure of the molecule of protein even when 
it is in the crystal, let alone in solution. 
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Problem 4-4: The structure below was drawn from a 
crystallographic molecular model of a particular pro- 
tein.” It depicts only a small portion of the entire mole- 
cule. Trace the polypeptide backbone through the 
structure. 


(A) How many lengths of polymer enter the figure? 


(B) Identify as many of the amino acids as you can 
along the polymer, and write out its sequence or 
sequences. 


(C) Identify the symbols for the individual atoms. 
Which atoms are not depicted? Why? 


Refinement 


The result of fitting the molecular model of a protein into 
its map of electron density, either manually or automat- 
ically by computer, is an initial crystallographic molecu- 
lar model. At this stage, the accuracy of the 


crystallographic molecular model is sufficient to define 
the patterns in which secondary structures are arranged. 
Because individual atoms, however, usually do not 
appear in the initial map of electron density, the initial 
crystallographic molecular model usually does not have 
sufficient accuracy to establish atomic details. These are 
of importance in their own right as well as being essen- 
tial to understanding most of the biological functions of 
proteins. If the data set has been gathered to narrow 
enough Bragg spacing (0.3-0.25 nm or less), the accuracy 
of a crystallographic molecular model can be improved 
significantly by the process of refinement. The refine- 
ment of a crystallographic molecular model is the sys- 
tematic adjustment of the positions of its atoms and the 
uncertainties of those positions and the addition to the 
model of portions of its covalent structure unobserved 
initially as well as molecules of solutes and water so that 
the amplitudes of the set of structure factors calculated 
from the model reproduce the observed amplitudes of 
the data set as closely as possible. 

Although the fold of the polypeptide chain and the 
general positions of the side chains of the individual 
amino acids usually do not change significantly upon 
refinement, the atomic details of both the polypeptide 
and the side chains almost always change dramatically. If 
it is the case that the dramatic changes occurring during 
refinement actually do bring the molecular model closer 
to reality, then the atomic details observed in initial, 
unrefined molecular models are best ignored until the 
refinement has validated their existence. 

The first step in a refinement is to calculate the 
amplitudes of the structure factors that the initial 
molecular model itself would produce, so that these 
amplitudes can be compared to the observed ampli- 
tudes of the data set. Once the initial molecular model 
of the polypeptide has been fit into the map of electron 
density to the satisfaction of the crystallographer, the 
coordinates within the fundamental unit cell of each of 
its atoms other than the hydrogens can be determined 
by direct measurements of the model. Often, if the ini- 
tial map of electron density is of high enough quality, 
some individual molecules of water and solutes can be 
observed and included in the initial model and their 
coordinates also measured. There are always large 
regions of bulk solution in which individual molecules 
of water and solutes are never delineated because these 
regions are fluid in the crystal and thus unstructured. 
These regions of bulk solvent are included in the initial 
model as geometric solids of the appropriate uniform 
electron density. 

A set of theoretical structure factors can be calcu- 
lated by Fourier transformation (Equation 4-7) from the 
coordinates of the atoms, the shape of the geometric 
solid occupied by the solvent, and the scattering func- 
tions for each atom and for the solvent. The amplitudes 
of this set of calculated structure factors are referred to as 
the calculated amplitudes, and the set of these ampli- 


tudes is designated E The set of simultaneously calcu- 
lated phases of those structure factors is designated as 
a. The amplitudes of the original experimental data set 
or any subset thereof are referred to as the observed 
amplitudes and designated F,. The set of phases esti- 
mated by isomorphous replacement are euphemistically 
referred to as the observed phases, and the set contain- 
ing their values is designated as o, All of these designa- 
tions, Fẹ a, F» and a, refer to three-dimensional 
matrices, each containing 5000-100,000 elements, all 
individually indexed as either Eu or On. 

The only directly observed quantities are the 
observed amplitudes F,, and they are the only parame- 
ters against which the success of the construction of any 
molecular model can be judged. If the molecular model 
were an exact representation of the molecules of protein, 
small solutes, and water within the crystal and there were 
no systematic errors in the observed data set, the calcu- 
lated amplitudes F, would be identical to the observed 
amplitudes F,. It is traditional to quantify the degree of 
this correspondence with a crystallographic R-factor: 
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where F ng and Lu are the observed and calculated 
amplitudes, respectively, of the structure factor hkl. The 
summation is performed over all available pairs of corre- 
sponding observed and calculated amplitudes or some 
subset of the available pairs. Once the Bragg spacings 
included in the data set are less than about 0.5 nm, so 
that the electron density of the solvent can be properly 
reproduced, the differences between the initial molecu- 
lar model and the real structure usually become more 
significant the smaller the Bragg spacing, and the value 
of the R-factor has a tendency to increase in magnitude 
as the data set is expanded to include the amplitudes of 
structure factors of smaller and smaller Bragg spacing. 
Therefore the minimum Bragg spacings of the reflections 
included in the data set must be known to assess the 
significance of the value of the R-factor. 

The value of the R-factor is often presented as a 
measure of the validity of a particular crystallographic 
molecular model. Such claims should be ignored. An 
incorrect crystallographic molecular model can give a 
reasonable R-factor. For example, an incorrect crystal- 
lographic molecular model (Bragg spacing > 0.2 nm)” for 
the ferredoxin from Azotobacter vinelandii had an R- 
factor of 0.24, while the later, presumably correct, crys- 
tallographic molecular model (Bragg spacing > 0.2 nm)” 
had an R-factor of 0.21. An incorrect crystallographic 
molecular model (Bragg spacing > 0.3 nm)“ for the ras 
protein had an R-factor of 0.29, while the later, presum- 
ably correct, crystallographic molecular model (Bragg 
spacing > 0.26 nm)” had an R-factor of 0.23. It is not the 
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value of the R-factor that should be used as a validation 
of the model but the agreement of the model with inde- 
pendent chemical observations. In the case of the ferre- 
doxin from A. vinelandii, it was disagreements between 
the earlier crystallographic molecular model and several 
direct chemical observations of the protein that 
prompted a reevaluation.” Both of these examples were 
situations in which the chains were incorrectly traced in 
the original maps of electron density, and this produced 
very large errors in the molecular model.” Smaller errors 
may often go undetected. 

Usually, the initial molecular model yields an R- 
factor of 0.30-0.60. This means that the amplitudes cal- 
culated from the model differ on the average by 30-60% 
from the observed amplitudes. At first glance this seems 
alarming because a completely random acentric struc- 
ture would give an R-factor of 0.59. It is not so disturbing, 
however, because it is obvious from direct observation 
(Figure 4-12) that a unique structure has been defined by 
the map of electron density. Nevertheless, such a large 
value of the R-factor indicates that the initial molecular 
model does not duplicate the structure of the actual mol- 
ecule very accurately and suggests that there is room for 
improvement. The improvements made in the structure 
after the initial model has been constructed are the 
refinements. The goal of refinement is to produce a 
molecular model the calculated structure factors of 
which have amplitudes as close as possible to the respec- 
tive observed amplitudes. To accomplish this goal, the 
positions of each of the atoms in the model are adjusted 
in such a way that the R-factor decreases in magnitude. 
Only when it is realized that models of molecules of pro- 
tein have 500-10,000 atoms that are not hydrogen and 
that the movement of any one of these atoms in the 
model affects the amplitudes of all of the structure fac- 
tors in the set F, is the task of refinement placed in a 
proper perspective. 

The most easily understood way to perform a 
refinement proceeds by calculating difference maps of 
electron density. When two sets of crystallographic 
amplitudes are available for the same structure or for two 
structures so similar that the same set of phases, Ou, can 
be used for both, a difference map of electron density, 
Ap(x,y,z), can be calculated: 


2 > (Fax - Fu) exp[-2ri(hx + ky + lz - oul 
(4-10) 


where Eu and Fy. refer to the entries in the two available 
sets of amplitudes. Equation 4-10 produces a map that 
has positive electron density wherever p(x,y,2) is greater 
than p’(x,y,z) and negative electron density wherever 
p(y,z) is less than p’(x,y,z), where p(x, y,z) and p’ (x,y,z) 
are the two maps of electron density that would be cal- 
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culated directly from the respective amplitudes and 
phases. 

Difference maps of electron density have many 
more uses than in refinement; but, in this particular 
instance, F,,; are the entries in the set F, and Eu are the 
entries in the set F, and ou are almost always those in 
the set of calculated phases. The intention of such a dif- 
ference map of electron density is to indicate where the 
molecular model differs from the actual molecule. Where 
there is positive electron density in the map, the actual 
molecules in the crystal have matter in that location that 
is not present in the molecular model. Where there is neg- 
ative electron density, there is matter at a certain location 
in the crystallographic molecular model that is not pres- 
ent in the actual molecules in the crystal. Adjustments 
can then be made in the model at the locations where sig- 
nificant differences occur. For example, the phenyl ring 
of a phenylalanine in the molecular model, sitting in neg- 
ative electron density, can be moved over to occupy pos- 
itive electron density. Unfortunately, as adjustments are 
made at one point in the molecular model, unavoidable 
shifts occur elsewhere, and the new R-factor calculated 
with the adjusted model usually does not change dra- 
matically and often increases. A new difference map must 
be calculated to locate the new problems and the process 
repeated. This approach is an example of manual tuning, 
and it is slow and ultimately unsatisfactory. 

There are several approaches to refinement that are 
designed to discover the optimal shifts of all of the atoms 
simultaneously by computation rather than manual 
tuning. These techniques are all based on the minimiza- 
tion of a particular multivariate function by solving 
large sets of simultaneous differential equations through 
matrix methods. Suppose that there is a multivariate 
function 0 where 


0 = f (X1, X2, Mare Xp) (4-11) 


Suppose also that the values for the variables x; have 
been assigned initial magnitudes and that one wishes to 
discover the individual shifts, Ar, in the magnitudes of 
the assigned values of each of the variables that will pro- 
duce a minimum numerical value for 6. It can be shown™ 
that 


A; xh =k; (4-12) 


where Aj; is the square matrix (n x n) the elements of 
which an 


Y OX; OX; 


(4-13) 


h is the vector the elements of which are the individual 
shifts, Ax, in each x; required to minimize 9, and k is the 


vector the elements of which are 06/ox;. Equation 4-12 is 
solved” for h, and its solution defines the shifts, Ax; in 
each variable Ax;, required to produce a minimum value 
for 0. 

Suppose the variables x; are the positions of the 
atoms j in the molecular model of a protein, and 


= 2 2 2 Wyki(F o,hkl - Fon) (4-14) 


where En is the observed amplitude of a particular 
structure factor hkl, Fn is the calculated amplitude of 
the same structure factor hkl, and wg is a weight 
assigned to a given structure factor. The magnitude of 
Why Varies with the certainty of the value of each 
observed amplitude. The values of F, are fixed quantities, 
but the values of F, are direct functions of the positions 
of the atoms j of the molecular model in the fundamen- 
tal unit cell. These positions are the variables x;. The solu- 
tion of Equation 4-12, the vector h, would then be a list 
of the shifts in the positions of each atom j of the molec- 
ular model that would produce a minimum value of 0 
and, presumably, a minimum value of the R-factor. This 
differs from manual tuning in that the effects of all shifts 
are considered simultaneously. 

This conceptually simple but computationally 
complex approach suffers from the drawback that the 
shifts of the atoms j are unconstrained. In other words, 
every atom j in the molecular model would be allowed to 
shift independently regardless of the shifts imposed 
upon its neighbors. Consequently, atoms j would drift 
away from the neighbors to which they are connected 
through covalent bonds to produce unrealistic struc- 
tures. When the data set extends to small enough Bragg 
spacing, this is not a problem because each atom j is rep- 
resented by a sphere of electron density in the map 
which confines it to the vicinity of its proper location. But 
because the data set usually does not extend to such 
small Bragg spacings, some other means must be used to 
confine the individual atoms j of the molecular model. As 
one is dealing with the covalent structure of a polypep- 
tide rather than a distribution of unconnected atoms j, 
the bond lengths and bond angles of the covalent bonds 
connecting the atoms can be used to correlate their 
motions. For example, when any one of the carbon 
atoms j in the phenyl ring of a phenylalanine is shifted, 
all the others must also be shifted accordingly because 
they are all covalently attached to each other. To accom- 
plish this, constraints are added to the minimization to 
force the motions of the bonded atoms j to be corre- 
lated.” The definition of 0 is changed so that 


6 = 2 Anel fun" Funk) + 2 w la, =de E 
q 


(4-15) 


where d,,, is the ideal, standard distance between any 
two atoms that are rigidly connected by the covalent 
structure, for example, one of the ortho carbons and the 
para carbon of a phenyl ring, d, is the distance between 
them in the final, refined molecular model, and w, is a 
weight the magnitude of which is chosen on the basis of 
how constrained the particular distance must be. If the 
two atoms j the positions of which are x; and x;, respec- 
tively, are directly attached to each other, w, is large. If 
there are three or four covalent bonds between them, Wq 
is small. By adding the second term in Equation 4-15, 
bond distances and any rigid bond angles, such as those 
in a phenyl ring, are retained during the minimization. 

The choice of which bond angles and bond dis- 
tances to constrain is a subjective one that has a signifi- 
cant effect on the final crystallographic molecular model. 
Phenyl, indolyl, or imidazoyl rings are obvious, but exo- 
cyclic bond angles less so. It is usually unwise to con- 
strain these bond angles in any structure other than the 
routine polypeptide, for example, in a covalently bound 
enzymatic inhibitor.” When accurate values for bond 
angles and bond lengths for an FeS, cluster were avail- 
able from a model compound, these were used as con- 
straints early in the refinement of the crystallographic 
molecular model of the ferredoxin from Anabaena but 
then were removed from the process later on to incorpo- 
rate the actual differences between the structures of the 
cluster in the protein and in the model compound.” On 
the contrary, it was concluded that the orientations of the 
ligands from the protein to the two irons in the nuclear 
cluster in the crystallographic molecular model of 
ribonucleoside-diphosphate reductase from Escherichia 
coli were inconsistent with spectral studies when no con- 
straints on those orientations were applied but that con- 
straining its structure to a conformation consistent with 
the spectral observations produced as satisfactory a 
refinement.” A compromise must be made between 
including enough constraints to hold the atomsj 
together and in reasonable orientation and including so 
many constraints that ideality replaces reality. 

Alternatively, it has been proposed” that @ can be 
written as 


ES : 2 2 Wnei Fonet Font)” + We E, (4-16) 
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where E, is a theoretically calculated value of the poten- 
tial energy for the molecular model and w, is a weight 
given to this term. The weight w, is arbitrarily adjusted to 
make it more or less important during the refinement. In 
this approach, covalent bonds between atoms j remain 
because their distortion would produce a major increase 
in E,. This approach has an advantage over the consider- 
ation of only interatomic distances (Equation 4-15) 
because any shift in an atom j in the molecular model 
causing it to overlap another atom j automatically causes 
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E, to increase dramatically. The disadvantage of using Ep 
is that once overlaps are eliminated and covalent bonds 
are retained, the refinement is influenced by a large 
number of noncovalent forces imposed by the theoreti- 
cal function and these may or may not be realistic. These 
biases influence the shifts of the atoms j dictated by h. 

Even in a rigid, anhydrous crystal of a small mole- 
cule, the atoms j and functional groups retain rotational 
and vibrational motion, which displaces them continu- 
ously and rapidly from their mean positions. The vibra- 
tional and rotational motion of the atomsj of a 
macromolecule of protein in a hydrous crystal are much 
more dramatic. There are vibrational motions involving 
segments of the polypeptide as well as those of the indi- 
vidual atoms j, and the water surrounding the molecule of 
protein does not sterically hinder the rotational or vibra- 
tional motion of its functional groups so dramatically as 
the immediate neighbors hinder the rotational motions of 
functional groups in an anhydrous crystal. Often the vibra- 
tional and rotational motion that occurs within the mole- 
cule of protein in the crystal is sufficient to blur the electron 
density of a side chain or a segment of the polypeptide so 
extensively that it is never present even in the refined map 
of electron density. Every atom j for which electron den- 
sity is observed, however, is subject to vibrational if not 
rotational motion, and the extent of the resulting dis- 
placements of each atom j differs depending on the rigid- 
ity of its bonding and the rigidity of its surroundings. 
During the refinement of the crystallographic molecular 
model, it is possible to estimate the magnitudes of the 
actual displacements from its mean position experienced 
by each atom j in the molecule within the crystal. 

The scattering factors f; inserted into Equation 4-7 
are affected by the displacement of each atom j from its 
mean position. It was noted earlier that, because of inter- 
ference, as Ou increases, the scattering from a given 
atom j decreases. Vibrational motion and rotational 
motion, because they also increase interference, also 
cause the scattering produced by the electrons around 
an atom j to decrease. It has been shown! that for the 
scattering factor of atom j in a molecule within a crystal 


f = foj exp[-82? u? (sin? O,4)) A] an 


where fy; is the scattering factor for atom j at rest, 
obtained from the usual table listing scattering factors as 
a function of scattering angle, and uj is the mean square 
amplitude of the displacement in all directions of atom j 
from its mean position, regardless of the reason for that 
displacement. This mean square amplitude of the dis- 
placement incorporates not only the vibrational and 
rotational motion experienced by the atom j but also 
static disorder that may occur within the crystal lattice 
and that consequently affects the position of the atom j 
when it is averaged over the whole crystal. It is custom- 
ary to define a B value (temperature factor) for atom j 


176 Crystallographic Molecular Models 


B= 87° u,” (4-18) 


to simplify Equation 4-17 and obscure uf. The units of 
the B value are nanometers’. 

The procedure for refining the position of a given 
atom j in a crystallographic molecular model (Equation 
4-15) is usually expanded to include a refinement of both 
its position and its B value Hl By use of Equation 4-17, the 
individual B; are incorporated as variables into Equation 
4-7 in addition to the variables of mean position x;, yj 
and z; The distances constraining the atomsj in 
Equation 4—15 are replaced with variances of interatomic 
distances. In this way the B values quantifying the dis- 
placements of the atoms j can be explicitly estimated 
during the refinement as well as the mean positions of 
the atoms j. It is also possible to resolve uj into its three 
components along the a, b, and c axes and refine these 
three anisotropic thermal parameters for each atom j.° 
Every atom j in a molecule of protein in a crystal under- 
goes its own particular motion with which are associated 
displacements from its mean position. It is important to 
remember that B values or anisotropic thermal parame- 
ters obtained crystallographically are only estimates of 
the actual magnitudes of these displacements. 

The B value for a particular atom j in the crystallo- 
graphic molecular model of a protein, as an estimate of 
its mean displacement from its mean position, provides 
an indication of its confinement. Usually, the atoms j in 
the interior of the molecule are confined by the sur- 
rounding atoms j and have low B values while those on 
the exterior, exposed to the water, have high B values. 
The flexibility of a segment of polypeptide is indicated by 
the set of B values for the atoms j of which it is composed. 
The atoms j in the most flexible or statically disordered 
segments of polypeptide or side chains, however, do not 
have B values assigned to them because they do not con- 
tribute to the diffraction and hence do not display any 
structured electron density in the map. 

It has been pointed out that if only a portion of the 
available amplitudes is used for refinement with 
Equation 4-15 or another like it, the R-factor calculated 
with the portion not used is a more unbiased measure of 
the validity of the final model than the R-factor calcu- 
lated with all of the amplitudes.” The observed ampli- 
tudes and calculated amplitudes are divided at random 
into a test set containing 10% of them and a working set 
containing 90% of them. The working sets are the ampli- 
tudes Fy vu and F, ny used in Equation 4-15, while the test 
sets are the amplitudes F, and E, used at each cycle of 
refinement to calculate a free R-factor (R;..) with 
Equation 4-9. In this way, the set of calculated ampli- 
tudes, F,» used to calculate R;,.. are not the same calcu- 
lated amplitudes, E vu guiding the refinement, and Rj.ee 
becomes a more independent measurement of the suc- 
cess of the refinement than the complete R-factor. 

The use of a free R-factor during refinement should 


eliminate errors in fitting the molecular model into the 
map of electron density. A standard R-factor, by design 
(compare Equations 4-9 and 4-15), must always 
decrease as the refinement progresses. If, however, the 
molecular model has been fit into the map of electron 
density incorrectly, for example, if the polypeptide has 
been incorrectly traced, then the free R-factor should 
remain constant or increase as the refinement progresses 
and the standard R-factor automatically decreases. As a 
result, the free R-factor has become an important indica- 
tor of the validity of a crystallographic molecular model. 

In the simplest, but most time-consuming, format 
for refining a crystallographic molecular model (Figure 
4-19), Equation 4-15 is used to calculate shifts in the 
atoms j that cause 6 to assume a minimum value. These 
shifts are used to create a new set of atomic positions x; 
and another cycle of refinement is performed with these 
new positions as initial values for the x;. In each of the 
first few of these cycles the value of R, and ideally that of 
Ree, drops (Figure 4-19), but the decreases become more 
modest at each cycle until no further progress can be 
made. The reason for this is that the process of refine- 
ment has become trapped within a local minimum of 
the function 6 because refinement performed in this way 
will only progress as long as @ is decreasing. Manual 
tuning, however, is a way to escape from the local mini- 
mum in which the process is trapped. At some point 


0.45 E—06-0.25nm — E + E+ E-+ -E— 
i t Ee c Z = = 
CG oAFo ei ei 
o CH N N N 
N oO CH CH KE 
L oO H 1 ' oO 
ee, 
= 0.35 = 5 e 
S E 
g E 
GC 
a 
E AF 
; | 
0.15 7a lasalısı li los ı li sı Lu 
40 80 120 160 


Cycle number 


Figure 4-19: Progress of a refinement of the crystallographic 
molecular model of deoxyribonuclease LI An initial molecular 
model was constructed by fitting a molecular model of the 
polypeptide into the initial map of electron density. The R-factor of 
the initial map was 0.45. The R-factor is presented as a function of 
the number of cycles of least-squares refinement performed. At the 
cycles indicated by AF, difference maps were constructed from F, 
and F. by use of the calculated phases of the molecular model to 
that point. These difference maps were used to rebuild the model 
manually and establish a new trajectory for the refinement. At cycle 
98, molecules of water were added to the molecular model at loca- 
tions identified in the maps of difference electron density. The 
range of Bragg spacings included in the data sets at each cycle is 
indicated at the top of the figure. The final R-factor of the refined 
molecular model was 0.16. Reprinted with permission from ref 83. 
Copyright 1986 Academic Press. 


(designated AF in Figure 4-19), the decision is made to 
calculate a difference map of electron density. 
Adjustments of the current molecular model are made by 
manual tuning, and this allows the minimization to enter 
realistically a new trajectory. After this trajectory reaches 
a new local minimum, more tuning is performed. This 
strategy, however, is never followed today. 

The manual adjustments performed at various 
times during this simplified strategy for refinement 
require a significant amount of time. Whenever the 
refinement reaches a plateau and no further progress is 
evident (Figure 4-19), the molecular model must be 
examined in detail and manually adjusted with the assis- 
tance of difference maps of electron density before a new 
trajectory can be initiated. It has been found that one 
way to avoid such time-consuming manual adjustments 
during the refinement is to combine molecular dynamics 
and refinement.” 

In a molecular dynamics simulation, atoms j are 
positioned in space, for example, by the fitting of the 
molecular model into the initial map of electron density 
and perhaps an initial round of refinement. A global 
potential energy function E,, incorporating the individ- 
ual potential energy functions of the covalent bonds and 
the nonbonded interactions, is calculated. The atoms j 
are then given kinetic energies appropriate to a certain 
temperature and allowed to move for a short interval 
(less than 1 fs) within this global potential energy func- 
tion according to classical laws of motion. The new posi- 
tions in turn create a new global potential energy 
function and the atoms j are allowed to move again in 
response to the new component potential energy func- 
tions, and so forth. 

When molecular dynamics is used in crystallo- 
graphic refinement, the global potential energy function, 
E,,» for each step iin the usual molecular dynamics cal- 
culation is augmented by an effective potential energy 


Bar = Wy 2 2 2 Wat (Fo,nki ~ Fon)” 4-19) 


where w, is a weighting factor chosen so that E,; has the 
same magnitude as E, and F, is the set of the amplitudes 
of the structure factors calculated for the instantaneous 
distribution of atoms j after each step in the molecular 
dynamics calculation. This effective potential energy, 
Bn constrains the atoms j during the molecular dynamic 
trajectory to the vicinity they occupied in the original 
molecular model, but if a high enough kinetic energy is 
applied, the atoms j can move as much as 0.3 nm from 
their initial positions.***° This is what allows the struc- 
ture to break out of local minima of the function 0. 

The process proceeds in several steps referred to as 
simulated annealing.” Initially a high kinetic energy is 
applied to the atoms j (high temperature), and then the 
kinetic energy is decreased to finish within a minimum of 
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potential energy. It is while the kinetic energy is high that 
local minima of potential energy, and hence local 
minima of the function 6, can be passed through. It has 
been shown that in this way the R-factor can be mini- 
mized with much less need for manual adjustment of the 
molecular model H Because rather high simulated tem- 
peratures are used, however, unexpectedly large move- 
ments of segments of the model can occur,” so the 
necessity to examine difference maps of electron density 
and perform manual adjustments remains. Nevertheless, 
with proper precautions, refinement by molecular 
dynamics converges on the same structure as refinement 
performed entirely by least-squares minimization and 
manual adjustment.?*°* 

The use of simulated annealing by molecular 
dynamics for the purpose of pushing the refinement out 
of local minima includes coincidentally a large number 
of hidden constraints in the potential functions used for 
covalent bonds and nonbonding interactions that are 
necessary to cause the atoms j to move in each step. 
These potential functions were not constructed for 
solutes in aqueous solution, which is what a molecule of 
protein in a crystal is. As a result, they introduce signifi- 
cant, uncontrolled biases into the final molecular 
model. 

These biases are most clearly manifested in the final 
positions of the charged side chains. The choice of a 
charge number for a side chain has a dramatic effect on 
its location in the final crystallographic molecular model 
in which simulated annealing is used for refinement.” 
This, however, should not be the case for charged groups 
in aqueous solution of moderate ionic strength. A com- 
pilation of the frequency of hydrogen bonds between 
oppositely charged donors and acceptors of hydrogen 
bonds in crystallographic molecular models found them 
to be no more frequent than hydrogen bonds between a 
neutral donor and a neutral acceptor or between a 
charged donor or acceptor and a neutral acceptor or 
donor, respectively.” This compilation was gathered 
from crystallographic molecular models refined by 
manual adjustments rather than by simulated annealing. 
Another compilation,” gathered after refinement by 
simulated annealing had become widespread, found that 
hydrogen bonds between oppositely charged donors and 
acceptors were almost 5 times more frequent. Since the 
actual frequency cannot change, this latter result sug- 
gests that refinement by simulated annealing does con- 
sistently introduce artifacts into crystallographic 
molecular models. Because the potential functions for 
the attraction between these oppositely charged groups 
in simulations performed by simulated annealing are 
unrealistically strong, it would not be surprising if this 
procedure produced such interactions artifactually. 
Another indication of the unreliability of assignments of 
hydrogen bonds between oppositely charged donors and 
acceptors is that their identity usually changes signifi- 
cantly, and often dramatically, from an earlier version to 
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a later version of a crystallographic molecular model 
even though both versions were built from refined maps 
of electron density calculated from data sets gathered 
from the same crystal. 

There are now at least five widely used procedures 
for refining crystallographic molecular models. It is reas- 
suring that even though the final models prepared with 
each method differ detectably,” when two of them are 
used to refine the same model with the same data set, the 
two refinements usually converge to a common struc- 
ture.” Often two different refinement procedures are 
purposely used to reassure the investigators that a pecu- 
liar aspect of the molecular model is real.” 

As a refinement progresses, there is a noticeable 
improvement in the shape and continuity of the tube of 
refined electron density representing the polypep- 
tide.” Segments of the polypeptide in the initial molec- 
ular model often move during the refinement, 
sometimes as much as 1 nm, to assume their positions in 
the final molecular model D" especially in regions 
where the initial map of electron density was vague. 
Elements of secondary structure missing in the initial 
map of electron density can appear and elements of sec- 
ondary structure in the initial map can occasionally dis- 
appear upon refinement, and positions assigned to 
specific amino acids in the sequence of the protein 
within secondary structures can shift dramatically.” 

Locations where the published amino acid 
sequence is in error become obvious,” and it is some- 
times possible visually to read the amino acid sequence 
of a segment of the protein as yet unsequenced.” If the 
map of initial electron density has been calculated from 
a data set gathered to narrow enough Bragg spacing 
(<0.16 nm), the electron density for individual amino 
acids in the initial map can be sharp enough that they 
can be tentatively identified and their side chains incor- 
porated into the original molecular model even though 
the sequence of the protein is unavailable.’ As the 
refinement progresses, mistakes in these initial assign- 
ments become obvious and can be corrected. 

The electron density for the carbohydrate attached 
to a glycoprotein, if it is not disordered in the crystal, 
becomes progressively more detailed. The electron 
density for coenzymes, which are almost always held 
rigidly within the protein, also becomes easier to inter- 
pret. Posttranslational modifications, sometimes 
unexpected,” as for example 3-(S-cysteinyl)tyrosine 
and $-hydroxytryptophan (Table 3-1), and sometimes 
hoped for, as for example the ester intermediate in 
the  self-catalyzed pyruvylation of aspartate 
1-decarboxylase (Equation 3-9),! begin to appear in the 
difference maps of electron density. Previously unac- 
counted-for molecules of water (oxygen atoms j) and 
anions and cations from the crystallization solution that 
are bound at specific locations on the surface or in the 
interior of the molecules of protein begin to appear in the 
difference maps and they become sharp and repro- 


ducible features. Any of these features not incorporated 
into the initial molecular model appear in a difference 
map of electron density because they are fixed at certain 
locations in the real fundamental unit cell by their spe- 
cific covalent bonds and noncovalent interactions with 
the molecules of protein but are as yet missing in the 
model. 

When the identity, location, and structure of each 
of these fixed molecules or portions of the covalent struc- 
ture of the polypeptide that were not included in the ini- 
tial molecular model, because they did not appear in the 
initial map of electron density, become sufficiently 
unambiguous, they are incorporated into the molecular 
model at that cycle of the refinement. Their inclusion 
causes a significant decrease in the R-factor because they 
are as real a feature of the actual crystallographic unit cell 
as the individual amino acids in the polypeptide, and 
they contribute accordingly to F,. For example, in the 
refinement for deoxyribonuclease I (Figure 4-19), the 
inclusion of the water molecules observed in difference 
maps at cycle 98 caused the molecular model to be much 
more realistic and permitted the refinement to produce 
a significantly lower minimum of the R-factor than it had 
before they were included. This reasonable consequence 
suggests that the refinement is registering reality, but all 
of the changes taking place during the refinement are 
adequate evidence that a crystallographic molecular 
model is always provisional. 

There are now more than 30 crystallographic 
molecular models of proteins that have been fit into 
maps of electron density calculated from data sets with 
minimum Bragg spacings so narrow (<0.1 nm) that the 
individual atoms j, and in one instance even bonding 
electron density," are clearly observed in the initial 
map.%10%106 In these instances, few if any constraints 
were required during refinement. Nevertheless, almost 
all of even the most recently constructed crystallographic 
molecular models of proteins have not had the benefit of 
such accurate maps of electron density.“ Consequently, 
regardless of how the refinement is performed, ideal 
bond lengths and bond angles are almost always 
enforced upon the crystallographic molecular model 
because if they were not, the refinement could not be 
performed at all. Therefore, if a refinement were per- 
formed entirely by the computer, the final molecular 
model would be confined by all of these implicit and 
often unsubstantiated constraints. To verify that the 
process of refinement has not biased the final structure, 
careful inspections of difference maps of electron density 
are always required to identify locations where the actual 
structure of the protein deviates from these simple 
expectations. 

This inspection is routinely done by using omit 
maps of difference electron density (Figure 4-20). A seg- 
ment of amino acids, a coenzyme, or a posttranslational 
modification in the final refined crystallographic molec- 
ular model is omitted, and the truncated model that 


results is used to calculate Fy omit and G- am, The observed 
data set F, and E mp ANd Gomi are used to calculate 
(Equation 4-10) a difference map of electron density. In 
this difference map the omitted segment appears as pos- 
itive electron density. This positive electron density has 
the advantage that its details are defined only by the 
observed data set because nothing is present at this loca- 
tion in the truncated molecular model. The atoms jin the 
refined molecular model in this region are adjusted, if 
necessary, to fit within this difference electron density 
and added back to the molecular model. Then another 
segment of the updated molecular model is omitted and 
so forth over the whole structure. In this way, an attempt 
is made to incorporate into the final molecular model the 
ways in which the actual structure of the protein deviates 
from the ideal structure dictated by ideal bond lengths 
and bond angles and empirical functions of potential 
energy used during the refinement. It should be stressed 
at this point that the goal of all refinement is to produce 
a crystallographic molecular model that reproduces as 
accurately as possible the actual structure of the mole- 
cule of protein, including all of its perversities,!” rather 
than some ideal structure consistent with a set of theo- 
retical potential energy functions. 

There is one interesting and enlightening aspect of 
the process of producing an omit map. After asegment of 
the molecular model has been omitted, it is necessary to 
perform additional cycles of refinement on the molecu- 
lar model missing that segment before calculation of the 
Feomit and O% omit used to produce the omit map of differ- 
ence electron density.'®!° The reason for this require- 
ment is that the positions of all of the atoms jin the initial 
refined model, not just the atoms j omitted themselves, 
contains information about the positions of the atoms j 
omitted. This information would be transmitted to the 
calculated amplitudes, E. mp: but even more critically to 
the calculated phases, &,omiv that would be used to cal- 
culate the difference map of electron density were it not 
purged by repositioning the remaining atoms in the 
truncated molecular model by additional cycles of 
refinement. These additional cycles must be performed 
on the model in which the omitted atoms j are missing 
and hence unable to bias the calculations performed 
during the cycles of refinement. 

Now that the molecular model of the polypeptide is 
fit by computer into the map of electron density and the 
manual adjustments that were once performed manually 
by the crystallographer during refinement have been 
replaced by molecular dynamics, the intimate human 
involvement in these procedures that once occurred no 
longer occurs. There is no computational algorithm that 
approaches the acuity of an experienced human intel- 
lect. Consequently, at some point in the complete 
process the product must be examined carefully by the 
crystallographer if errors are to be avoided. It is the omit 
maps that present the most obvious opportunity for this 
human intervention. 
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The omit maps calculated for successive segments 
of the molecular model (Figure 4-20) must be examined 
carefully by the crystallographer to ensure that they do 
actually represent that segment of the model. If the 
polypeptide has been incorrectly traced and the wrong 
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ment. The molecular model was refined with the assistance 
of simulated annealing. From this refined molecular model, 
the first 31 amino acids of the polypeptide were omitted. 
The truncated molecular model was submitted to 40 addi- 
tional cycles of refinement before phases, &,omiv and ampli- 
tudes, F.omiv were calculated from it. These phases and 
amplitudes along with the observed amplitudes were used 
to calculate an omit map of difference electron density. The 
portion of that map including the segment of the polypep- 
tide from Glutamine 12 to Glutamine 24 is presented. A 
molecular model of this segment of polypeptide in its final 


sity calculated from observed amplitudes (Bragg spacing = 
conformation is positioned within the omit map. 


Figure 4-20: Omit map of electron density.” The initial 
crystallographic molecular model for the amino-terminal 
Trypanosoma brucei was built from a map of electron den- 


domain 
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segment of the molecular model has been fit into a par- 
ticular segment of electron density, that error will be 
obvious in an omit map of that segment of electron den- 
sity." The fit of the molecular model into the omit map 
of electron density must be adjusted manually by the 
crystallographer, not because a computer could not do 
so but because she must be convinced that the fit justi- 
fies the final conformation imposed upon the molecular 
model. Only in this way, with properly calculated omit 
maps and properly adjusted conformations, can the 
ideal structure resulting from the theoretical biases of the 
automated fitting and refinement be replaced by the real 
structure dictated by the observed amplitudes. For 
example, a hydrogen bond produced solely by the con- 
straints of the refinement for which there is no evidence 
within the observed amplitudes will not appear in a 
properly calculated omit map and must be removed 
from the crystallographic molecular model. The confor- 
mation of a side chain produced by the constraints of the 
refinement may differ significantly from the conforma- 
tion observed in an omit map and must be adjusted 
accordingly. 

In addition to the polypeptide, the process of 
refinement adjusts the conformations of coenzymes and 
oligosaccharides in the crystallographic molecular 
model. Coenzymes can be either covalently bonded to 
the polypeptide as additional examples of posttransla- 
tional modifications (Table 3-1) or enclosed within it so 
tightly that they form an integral structural component. 
In either case, the coenzyme never leaves the protein and 
is incorporated with the protein into a crystal. At this 
point, these molecules will be considered to be merely 
small clouds of electrons that have interesting shapes. 
Usually the existence and covalent structure of these 
coenzymes is known before the protein is crystallized. 

The electron density contributed by coenzymes 
known to be associated with a protein is always clearly 
featured because these molecules are enclosed within 
the protein and precisely aligned for functional pur- 
poses. The shapes of most coenzymes are unique, and 
they can usually be placed unambiguously into one of 
the envelopes of electron density unfilled by the 
polypeptide, but the decision as to when during the 
refinement they are included in the molecular model 
depends on the situation. If the initial, unrefined map is 
detailed enough and the coenzyme is large enough and 
of a shape peculiar enough, it can be inserted into its 
electron density at the same time the polypeptide is fit 
into its map of electron density. For example, the 
envelopes of electron density representing the four bac- 
teriochlorophylls b in the initial map of electron density 
for the photosynthetic reaction center calculated from 
the phases estimated by isomorphous replacement were 
clear enough that molecular models of bacteriochloro- 
phyll b, with its characteristic queue, could be inserted 
into several of them (Figure 4-21)!” even before the 
polypeptide could be fit into its electron density. In fact, 


the envelopes of electron density for the bacteriochloro- 
phylis b could be distinguished from the envelopes for 
the almost identical bacteriopheophytins by the bulge of 
electron density due to the magnesium ions present in 
the former but missing from the latter. Usually, however, 
a coenzyme is added to the model at a step in the refine- 
ment when its electron density in the difference map 
becomes detailed enough to insert it unambiguously, but 
adjustments, often major ones,” are made in its posi- 
tion and configuration as the map of electron density 
becomes more detailed during the cycles of refinement. 
The precise orientation of a coenzyme in the model is 
assigned in the end with an omit map (Figure 4-22).1!4 

The crystallographic molecular model for myoglo- 
bin (Figure 4-18) displays the characteristic, intimate 
association between a coenzyme and the polypeptide 
that enfolds it. In this case, the heme is embraced by the 
æ helices arranged to compose the entire structure, the 
purposes of which are to isolate the heme from the solu- 
tion, to prevent two hemes from colliding, and to permit 
the heme to dissolve in water in addition to providing a 
fifth ligand to the iron. 

Often ligands that are known to be specifically 
bound by the protein are included during the crystalliza- 
tion and are bound by the protein in the crystal. 
Significant changes can occur in the position and orien- 
tation of a ligand during refinement. For example, the 
molecular model of methotrexate inserted into the initial 
map of electron density of dihydrofolate reductase had to 
be adjusted significantly during refinement." The final 
position and orientation of the ligand are assigned in the 
final crystallographic molecular model by fitting them 
into features of electron density in omit maps.!"® 

The positions in the amino acid sequence of a gly- 
coprotein at which the oligosaccharides are attached are 
often known, so the locations of these serines, thre- 
onines, or asparagines in the map of electron density can 
be identified as soon as the polypeptide has been fit into 
it. Oligosaccharides are located on the outer surface of a 
protein and usually protrude into the aqueous phase sur- 
rounding it (Figure 4-23).®° Under these circumstances, 
they are fully solvated, flexible, and structureless. This 
absence of a fixed structure is carried into the crystal, and 
the region within the fundamental unit cell occupied by 
the oligosaccharide is often featureless. Attempts to 
assign an atomic structure to such regions are probably 
irrelevant to an understanding of the behavior of an 
oligosaccharide in a biological situation where it will 
have no defined structure anyway. Sometimes, however, 
the carbohydrate is surrounded sufficiently by protein to 
assume a defined conformation and produce structured 
electron density. If the initial map of electron density is 
calculated from a data set gathered to narrow enough 
Bragg spacing and the oligosaccharide is sufficiently con- 
fined by the structure of the protein, a molecular model 
built from its previously determined sequence of mono- 
saccharides can be unambiguously fit into that initial 


Figure 4-21: Electron density assigned to a bacteriochloro- 
phyll b,''” one of the coenzymes in the reaction center from 
Rhodopseudomonas viridis. A skeletal model of the known atomic 
structure of the coenzyme has been placed within an envelope of 
electron density located in the initial, unrefined map. Note the 
bulge of electron density in the center of the coenzyme that results 
from the magnesium ion with its 10 core electrons. Reprinted with 
permission from ref 112. Copyright 1984 Academic Press. 


map Jl" With initial maps of lower quality, the conforma- 
tion of the oligosaccharide in the crystallographic molec- 
ular model is adjusted during the refinement (Figure 
424) 18119 

As the refinement progresses, isolated spherical 
features of electron density larger than those that can be 
assigned to molecules of water appear and become more 
prominent in the map of electron density. These are neg- 
ative ions such as chloride or bromide or positive ions 
such as Na, K*, Mg”, Ca", be", Co", Mn”, Zn”, or Cu”. 
Often these features, such as the chloride ions in human 
collagenase 3,'”” can be assigned by their scattering 
strength and the ionic character of the protein surround- 
ing them. Such an assignment is strengthened by show- 
ing that if the designated ion is incorporated into the 
molecular model during the refinement, its refined 
B value ends up in the same range as the refined B values 
for the other atoms j in the model.’ If an incorrect 
assignment had been made, the refinement would have 
adjusted the B value to an unrealistic magnitude to com- 
pensate for the incorrect scattering factor incorporated 
into the calculations (Equation 4-17). 

It is also possible to use their anomalous dispersion 
to identify ions.'”” For example, when the wavelength of 
the X-radiation was decreased from 0.154 to 0.1377 nm, 
passing through the absorption edges for Ni (0.1488 nm) 
and Cu (0.1381 nm), the magnitude of the electron den- 
sity for a metal ion in the map calculated from structure 
factors from a crystal of nitrite reductase (NO-forming) 
did not decrease significantly, but when it was further 
decreased to 0.104 nm, below the absorption edge of Zn 
(0.1284 nm), it decreased significantly.” This effect 
demonstrated that the sphere of electron density in the 
map at this location represented a Zn”* ion. A difference 
map of electron density calculated from a data set gath- 
ered at a wavelength of 0.0870 nm and a data set gath- 
ered at 0.1488 nm, where iron has significant anomalous 
dispersion compared to other transition metal ions, 


Figure 4-22: Omit map of electron density for one of the phyco- 
erythrobilins in B-phycoerythrin from Porphyridium sordidum.'“ 
A crystallographic molecular model constructed from an initial 
map of electron density (Bragg spacing = 0.22 nm) was refined 
against the observed amplitudes with the assistance of simulated 
annealing. The phycoerythrobilin covalently bound by Cysteines 
50 and 61 of the f subunit of the protein was then omitted from the 
molecular model, and an omit map of electron density was calcu- 
lated. The final conformation chosen for the coenzyme is posi- 
tioned within the electron density. 
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showed strong electron density at the eight positions 
occupied by the metal ions in the two tetranuclear clus- 
ters in hybrid-cluster protein from Desulfovibrio vul- 
garis, an observation demonstrating that all of those 
metal ions are irons.” 

Although they can be readily observed in maps of 
scattering density calculated from neutron diffraction, 


atoms j of hydrogen are almost never observed in X-ray 
crystallographic studies of proteins because, unlike 
atoms j of carbon, nitrogen, oxygen, and sulfur, atoms j 
of hydrogen have no inner-shell electrons. Because they 
are in smaller orbitals, inner-shell electrons have the 
highest electron density and produce most of the fea- 
tures observed in the usual maps of electron density. In 
general, if hydrogens are present in a crystallographic 
molecular model, it is because the crystallographer 
knows they are there even though they were not 
observed. When, however, crystallographic molecular 
models obtained from data sets gathered to Bragg spac- 
ings of less than 0.1 nm are submitted to extensive 
refinement, spherical features of positive electron den- 
sity appear in difference maps of electron density at 
positions that are occupied by hydrogens in the real 
molecule of protein.°*!°'” These features arise because 
the molecular model has no hydrogens but the molecule 
of protein does. Of particular interest are those features 
of difference electron density that can be assigned to the 
hydrogens in hydrogen bonds.” 

Another peculiar feature that becomes apparent as 
a refinement progresses is that a few of the amino acids 
display alternative conformations in the map of elec- 
tron density. At the position of such an amino acid in an 
omit map of difference electron density, a feature having 
the shape of the superposition of two different rotational 
isomers of the amino acid is found (Figure 4-25).'*’ This 
feature of electron density arises because in the crystal 
the actual amino acid spends part of its time in one con- 
formation and part of its time in the other so the electron 
density in the map, which is averaged over the period in 
which the measurement was made, represents both con- 
formations simultaneously. Maps of electron density cal- 
culated from data sets gathered to narrow Bragg spacing, 
because of their higher quality, reveal a greater frequency 
of alternative conformations and more subtle examples 
of alternative conformations, such as the exo and 
endo conformations of particular prolines'” or the alter- 
native conformations of a cystine,'” and the features of 
electron density defining the alternative conformations 
are much sharper and less ambiguous.’”° In maps of elec- 
tron density of low quality, however, most of the alterna- 
tive conformations are never distinguished even upon 
refinement and their existence simply increases the 
B values for the functional groups that assume them. 

Inherent in the process of refinement is the ability 
to produce a crystallographic molecular model by 
molecular replacement. Many proteins for which data 
sets have been gathered for the first time are closely 
related to other proteins for which a crystallographic 
molecular model is already available. Such a close rela- 
tionship can be established by aligning the amino acid 
sequences of the two proteins. For example, the amino 
acid sequence of the aspartic endopeptidase from Rous 
sarcoma Virus (naa = 124) can be aligned with the amino 
acid sequence of the aspartyl endopeptidase from 


human immunodeficiency virus, type 1 (naa = 99), so 
that there are 30 identical positions and four gaps, all in 
the shorter protein.” It necessarily follows that the 
structures of these two proteins are superposable. The 
three long gaps in the shorter amino acid sequence (10, 
5, and 6 amino acids, respectively) can be assumed to 
represent loops on the surface of the larger protein 
missing from the smaller. A crystallographic molecular 
model, produced by multiple isomorphous replace- 
ment, was available for the larger of the two proteins, 
that from Rous sarcoma virus.” Crystals of the protein 
from the human immunodeficiency virus were pro- 
duced, and a data set was collected from them. The side 
chains of the amino acids in the crystallographic molec- 
ular model of the protein from Rous sarcoma virus were 
replaced with the corresponding side chains in the 
aligned amino acid sequence of the protein from the 
human immunodeficiency virus. The loops correspon- 
ding to the gaps in the alignments were removed from 
the model to produce a preliminary molecular model 
for the protein from the human immunodeficiency 
virus. 

This model was computationally aligned in the 
fundamental unit cell defined by the data set collected 
from crystals of the protein from human immuno- 
deficiency virus. This preliminary model was then 
submitted to refinement to produce a final structure with 
an R-factor of 0.18.’ As this example illustrates, the 
purpose of using molecular replacement is to avoid 
the experimental difficulties of obtaining phases. 
Because there are so many proteins for which crystallo- 
graphic molecular models are already available 
(http://betastaging.rcsb.org/pdb/Welcome.do), the like- 
lihood that the protein in a new crystal is related closely 
enough to one for which a model has already been made 
is fairly high. Consequently, many of the newly reported 
maps of electron density have been calculated by molec- 
ular replacement. 

How much more reliable is the refined crystallo- 
graphic molecular model than the initial model built into 
the map of electron density produced by the phases 
determined experimentally? It is true that the R-factor is 
much smaller, but this is not surprising because the 
decrease occurred automatically. A significant decrease 
in the free R-factor is more reassuring. These decreases 
state that the refined molecular model, although it is usu- 
ally not remarkably different from the original molecular 
model, has an electron density that produces structure 
factors the amplitudes of which are much closer to the 
observed amplitudes, which are the only directly 
observed quantities. 

Often the success of the refinement is touted by 
showing that the map of electron density calculated from 
F, and œ, of the final molecular model has features that 
very closely resemble phenyl rings or other equally char- 
acteristic side chains, but this is illusory. A map of elec- 
tron density constructed from F, and a. would have to 
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have features precisely resembling these side chains 
because F, and œ. are calculated from the molecular 
model itself, which always has ideal bond angles and 
bond lengths for the entire polypeptide, and the mini- 
mization has automatically caused F, to be as close as 
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Man(a1,6)[Man(«1,3)]Man(ß1,4)GlcNAc(ß1,9)GlcNAc was 
inserted into the final, refined map of electron density 
(Bragg spacing > 0.22 nm) for the glucan 1,4-a-glucosidase 
density adjacent to Asparagine 395. The oligosaccharide 
was included in the refinement from the beginning. 
Asparagine 395 is at the bottom of the structure. 


from the fungus Aspergillus awamori to fill the electron 


electron density." 
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possible to F,. Furthermore, as has already been noted, 
the phases used in the calculation have more impact on 
the final map of electron density than the amplitudes. 
Therefore, it is not surprising that the details explicitly 
put into the model by the crystallographer should reap- 
pear when the map of electron density is reconstructed 
from F, and a. or from any combinations of F., F,, and œe. 


A similar difficulty in evaluating the success with 
which a particular refinement has reproduced the real 
structure of the molecule of protein is the nature of the 
constraints applied. To progress efficiently, a refinement 
usually must be forced to retain bond lengths and bond 
angles or forced to be at the minimum of a function for 
potential energy; however, every constraint enforced 
upon the refinement is automatically incorporated into 
the final structure regardless of its reality. This fact is 
easily verified by examining crystallographic molecular 
models refined by different methods. The clear imprint 
of the constraints chosen for each method remains in 
each of the molecular models refined by that method.” If 
one of the constraints in Equation 4-15 is that every pep- 
tide bond shall be planar, the planarity of the peptide 
bonds in the final structure cannot be cited as a measure 
of the success of the refinement. In fact, it is probably 
more an admission of its failure to detect the normal 
deviations from planarity." If one of the constraints 
inadvertently introduced by Equation 4-16 is that the 
conformation along no carbon-carbon bond can eclipse 
vicinal methyls or methylenes, the absence of such 
eclipsed conformations cannot be cited as a measure of 
the success of the refinement. 

These are not minor criticisms. The structure of a 
map of electron density usually improves so breathtak- 
ingly upon refinement that those changes that were 
enforced by the crystallographer must be clearly sepa- 
rated from those changes that arise only from the real 
molecular structure in the crystal. It is only these uncon- 
strained and often unexpected features of the refined 
map of electron density which clearly state that it is an 
improvement.'°”!* 

The crystallographic molecular model, even after 
extensive refinement, should never be confused with the 
actual structure of the molecule itself. The crystallo- 
graphic molecular model is no more than the coordi- 
nates of the centers of all of the atoms j in the model that 
was built by the computer from line segments, that was 
fit into the map of electron density, and that was then 
refined against the available amplitudes. It is only as 
accurate as the phases that were finally decided upon, 
which still incorporate some of the errors in the experi- 
mental phases, the arbitrary constraints imposed during 
refinement, and the inherent biases in the formula for 
calculating its structure factors, which assumes spherical 
atoms j and harmonic rectilinear atomic motions. The 
fact that most crystallographic molecular models 
change, often dramatically, as further refinement is per- 
formed or as additional data are collected from the same 
crystals to narrower Bragg spacing,” even though the 
real structure of the protein has not changed, is the most 
obvious demonstration that the model is not the struc- 
ture of the protein but is a work of art representing the 
structure of the protein. It is unfortunate that the words 
“structure of the protein” are so widely used when refer- 
ring to the crystallographic molecular model of the pro- 


tein. This habit only adds to the misleading impression of 
infallibility associated with crystallography. 


Suggested Reading 
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Problem 4-5: The figure to the right” is a stereo view of 
the crystallographic molecular model of a protein con- 
taining 71 amino acids. This drawing was produced with 
MolScript.'*? By examining the molecular model in 
stereo, you will be able to ascertain almost all of the 
amino acid sequence of the protein. You will not be able 
to distinguish threonine from valine, glutamate from glu- 
tamine, or asparagine from aspartate. Make an educated 
guess for threonine and valine, and just choose at 
random for asparagine and aspartate and glutamine and 
glutamate. If you can’t make out an amino acid, put an X 
in its position in the sequence. 


(A) Write out the amino acid sequence of the protein 
in one-letter code. Number every tenth amino 
acid in your sequence to keep everything in regis- 
ter. 


(B) Which pairs of amino acids in the protein are 
cystines? Identify each pair by the sequence posi- 
tions of the two cysteines that form the cystine. 


(C) What do the isolated atoms j scattered around in 
the crystallographic molecular model represent? 


(D) How did the crystallographer distinguish between 
threonine and valine, between glutamate and 
glutamine, and among aspartate, asparagine, and 
valine? 
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Chapter 5 


Noncovalent Forces 


Crystallographic studies have demonstrated that a mole- 
cule of protein, dissolved in aqueous solution, is com- 
posed of polypeptides, each of which is folded into a 
structure that is the same as or closely similar to the 
structure of all of the other polypeptides of the same 
amino acid sequence. A polypeptide, as it emerges from 
the ribosome, however, is a fluid polymer of undefined 
structure. Each newly synthesized polypeptide then folds 
spontaneously to assume its unique secondary and terti- 
ary structure. 

The folding of polypeptides to form the native 
structure of a protein, the association of folded 
polypeptides to form multimeric proteins, and the 
binding of substrates, coenzymes, or other molecules to 
proteins usually proceed without the formation of cova- 
lent bonds and are consequently controlled by nonco- 
valent forces. It appears that four noncovalent forces 
are involved in these chemical reactions: ionic interac- 
tions, hydrogen bonds, the hydrophobic effect, and van 
der Waals forces. In the refined crystallographic molec- 
ular model of a folded protein, the consequences of 
these noncovalent forces are evident. The chemical and 
physical properties of these interactions, as they occur 
in aqueous solution, must be understood before those 
consequences can be appreciated. Therefore, a discus- 
sion of these interactions must precede a detailed 
description of the atomic details of refined crystallo- 
graphic molecular models. None of the four categories 
of noncovalent forces—ionic interactions, hydrogen 
bonds, the hydrophobic effect, and van der Waals 
forces—can be completely separated from all of the 
others. Van der Waals forces must play a part in each of 
the other three phenomena, hydrogen bonds can be 
considered to be special cases of ionic interactions, 
almost all ionic interactions in biochemical situations 
involve hydrogen bonds, and the hydrophobic effect is 
to a large degree the reflection of hydrogen bonding in 
the solvent. It is informative, however, to discuss each 
of these categories separately to focus on their unique 
properties. 

Each of these types of interactions can be consid- 
ered to be a special case of a noncovalent association 
between two molecules, A and B, or between two seg- 
ments, A and B, of the same molecule. For the situations 
under discussion, our attention will be directed to sucha 
reaction as it would occur in aqueous solution. A general 
chemical equation for these associations is 


A(H,0), + Bal — A-B(H,0), + (x+ y- z) H,O 
(5-1) 


The species A(H,O0), and B(H,O), are the separated 
solutes dissolved in water and surrounded on all sides by 
water. Presumably, there are a certain number of water 
molecules, x and y, respectively, that are significantly 
affected by the presence of A or B. The effects of the 
solute on the surrounding molecules of water and 
the effects of the surrounding molecules of water on the 
solute are referred to as solvation or hydration. Around 
a particular molecule of solute at a particular instant, a 
particular number of water molecules are affected signif- 
icantly by the presence of the solute. This number fluc- 
tuates with time, and the coefficients x, y, and z are 
intended to represent averages over a range of possible 
configurations for the hydration. When A and B associate 
to form the noncovalent complex, that complex will also 
be surrounded by water, and there will be a number of 
water molecules, z, that are significantly influenced by 
the complex. As A-B always has a smaller surface area 
than the sum of the surface areas of A and B, z should be 
less than x + y, and (x + y - z) molecules of water will 
return to the bulk phase of the water when the complex 
is formed. 

The change in standard free energy for the overall 
reaction can be expressed as 


AG? = AG’ + AG ‘hyd(A-B) + 


AC no ~ AG nyaay — AC ep 
(5-2) 


where AG*yq refers to the standard free energy of 
hydration between each of the solutes and its surround- 
ing waters of hydration, AG°,., refers to the direct stan- 
dard free energy of interaction between A and B, and 
AG 4,0 is the change in standard free energy experienced 
by the x + y - z molecules of water as they leave shells of 
hydration and return to the bulk aqueous phase. It will 
become apparent that all of the terms on the right-hand 
side of Equation 5-2 except sometimes the first are 
remarkably influenced by the fact that this reaction 
occurs in water as a solvent. No noncovalent interaction 
has the same outcome in any other solvent. For example, 
hydrogen bonds and ionic interactions are stable in 
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almost any other solvent but are dissociated by water, 
while the hydrophobic effect is observed only when the 
solvent is water. To appreciate fully these influences of 
water on the outcome of noncovalent associations, the 
properties of liquid water itself must be understood. 


Water 


The properties of liquid water, when considered in their 
entirety, are unlike those of any other liquid. For exam- 
ple, the surface tension of water at 20 °C is 73 dyne cm’, 
while those of most other liquids are between 20 and 
40 dyne cm". The relative permittivity,* £, of water at 
20 °C is 80.2, while the relative permittivities of other liq- 
uids, with few exceptions, are less than 30. The high 
melting point and boiling point of water, for a molecule 
of its size and composition, are well-publicized anom- 
alies. Not only are the numerical values of the physical 
constants anomalous, but the qualitative behaviors of 
the thermodynamic properties of the liquid, when it is 
exposed to variations of physical forces such as pressure, 
temperature, electric field, and electromagnetic energy, 
are unique. The details of these peculiarities provide an 
intuitive picture of the structure of liquid water that can 
serve as a basis for understanding the behavior of solutes 
such as polypeptides in this solvent. Unfortunately, there 
is no adequate molecular model for the structure of 
liquid water, and an informed intuitive picture is the 
closest approach to reality currently available. 

A water molecule in the dilute, ideal vapor is an 
oxygen atom bonded covalently to two hydrogen atoms. 
Quantum mechanical calculations”? of the isolated mol- 
ecule in the vacuum seem? to support the conventional 
orbital picture of an oxygen hybridized sp’ with two 
covalent bonds to two hydrogens and two o lone pairs of 
electrons; these four substituents are oriented tetrahe- 
drally around the oxygen. The HOH bond angle’ is 
104.5°, distorted from 109.5° by the electron repulsion of 
the lone pairs or by a rehybridization, driven by energy of 
promotion, that gives the oxygen-hydrogen o bonds 
more p character. The oxygen-hydrogen bond lengths 
are 0.096 nm. 

In more concentrated vapor, dimers of water form 
(Figure 5-1).°° From results of molecular beam 
microwave spectroscopy, the mean structure of the 
dimer can be calculated.”° The two oxygens are sepa- 
rated by a distance of 0.298 nm. One of the four hydro- 
gens lies on the line of centers between the two oxygens, 
and it is covalently bonded to one of them, which is 
referred to as the proton donor. The other oxygen, which 
is referred to as the proton acceptor, has two of the four 
hydrogens covalently bonded to it. The plane defined by 


* The relative permittivity or dielectric constant of a substance is 
its permittivity relative to the permittivity of vacuum. 


Figure 5-1: Selected dimensions 

of the dimer of two molecules of 
H water in the gas phase. The dis- 

tances and angles were obtained 
by microwave spectroscopy.”® 


the two hydrogens and the oxygen of the proton accep- 
tor is inclined at an angle of 60° to the line of centers 
between the two oxygen atoms. This means that the four 
substituents around the acceptor—the two hydrogens, 
the shared hydrogen, and the lone pair of electrons—are 
tetrahedrally arrayed in the dimer. This arrangement 
suggests that the oxygen-hydrogen bond on the donor 
points directly at one of the two o lone pairs of electrons 
on the acceptor oxygen. The axis of the sp’ orbital in 
which that o lone pair resides should be congruent with 
the line of centers. The interaction between the hydro- 
gen-oxygen o bond on the donor molecule of water and 
the o lone pair of electrons on the acceptor is an unhin- 
dered, intermolecular example of a hydrogen bond. The 
formation of dimers and higher oligomers in steam con- 
tributes significantly to its nonideal behavior at higher 
concentrations of water in the gas phase. 

The ice that is in equilibrium with liquid water at 
atmospheric pressure and 0 °C is known as ice Ih. Ice Ih 
is a tetrahedral diamond lattice of oxygen atoms (Figure 
5-2A),* each 0.276 nm from its nearest neighbor.’ The 
oxygens are held in the lattice by hydrogen bonds to each 
of their four nearest neighbors. Between any oxygen 
atom and each of its four nearest neighbors in the lattice 
is one hydrogen atom. At any instant, each hydrogen is 
covalently bound to one of the two oxygens between 
which it is found, and every oxygen has only two hydro- 
gens covalently bound to it. These two requirements 
create a situation in which only a predictable number of 
arrangements for these hydrogens can occur, and this 
number of arrangements can explain almost exactly the 
observed residual entropy of ice Ih at 0 K.‘ There is a sig- 
nificant amount of empty space in ice Ih (Figure 5-2B), 
and this is one of the properties permitting it to be less 
dense than the liquid water with which it can be in equi- 
librium. 

The structure and properties of ice Ih and water 
vapor have been exhaustively investigated and unam- 
biguously established. At atmospheric pressure, liquid 
water lies between these two extremes on the phase dia- 
gram, and its properties can be compared with them. 
From the transitions between solid and liquid and 
between liquid and vapor, insight into the structure of 
the liquid can be gained. 

When ice melts, the reaction involves a standard 
enthalpy of fusion, and when the liquid vaporizes, the 
reaction involves a standard enthalpy of vaporization. 
The enthalpy of water at atmospheric pressure can be 


Figure 5-2: Structure of ice Ih. (A) Tetrahedral lattice.’ In a tetra- 
hedral lattice, each atom or molecule occupies a position indicated 
by the small open circles. It forms equivalent connections with its 
four nearest neighbors, and because every connection is equivalent 
and every nearest neighbor is the same, the nearest neighbors are 
arrayed tetrahedrally and at equal distance. (B) Representation of 
space-filling models of molecules of water arrayed on the tetrahe- 
dral lattice of ice Ih. The spheres are atoms of oxygen, and the 
hydrogens, one for each connection of the lattice, are sandwiched 
between the oxygens. Reprinted with permission from ref 4. 
Copyright 1969 Clarendon Press. 


plotted as a function of absolute temperature (Figure 
5-3).4 On this plot, the discontinuities at the melting 
point and boiling point are the standard enthalpy of 
fusion and the standard enthalpy of vaporization, 
respectively. At 25 °C, the standard enthalpy of fusion is 
8.0 kJ mol‘, and the standard enthalpy of vaporization 
is 44 kJ mol’. This standard enthalpy of vaporization is 
more than twice that of a liquid such as chloromethane 
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Figure 5-3: Enthalpy (H-Hy), entropy (S-Sy), free energy (G—H)), 
and isopiestic heat capacity (C,) as a function of temperature for 
water at unit atmosphere (101.3 kPa) pressure.’ The quantities Hy 
and Sọ are the absolute enthalpy and absolute entropy, respec- 
tively, at 0 K. Enthalpy and free energy are in units of kilojoules 
mole", and heat capacity is in units of joule mole" kelvin™'. The 
values of entropy, also in joule mole” kelvin", are arbitrarily 
divided by 2 to put them on the same scale. Adapted with per- 
mission from ref 4. Copyright 1969 Clarendon Press. 


(AHyap = 19 kJ mol at 25 °C), which is a polar solvent 
(& = 13) containing molecules incapable of forming 
hydrogen bonds but much larger than water 
(50.5 g mol’). This comparison illustrates the fact that 
the standard enthalpy of vaporization of water is anom- 
alously large. In the sum of the two reactions, fusion and 
vaporization, all of the hydrogen bonds in ice Ih are lost. 
The fact that most of the standard enthalpy change 
occurs upon vaporization and the fact that the heat of 
vaporization is anomalously large suggest that liquid 
water retains most of the hydrogen bonds present in 
ice Ih. 

The high isochoric heat capacity, Cy, of liquid water 
(Figure 5-3) also indicates that it is highly structured. 
Calculations of the isochoric heat capacities of both 
ice Ih and water vapor, from the known vibrational, 


192 Noncovalent Forces 


translational, and rotational energy levels of these two 
substances, agree quite closely with observed values.* 
The observed values of the isochoric heat capacity of 
liquid water, however, are almost twice that calculated 
from its estimated vibrational, translational, and rota- 
tional energy levels (Figure 5-4).* This excess or configu- 
rational heat capacity can be explained by postulating 
that much of the hydrogen-bonded structure of ice 
remains in the liquid and its gradual deterioration as the 
temperature is raised is responsible for the anomalous 
absorption of heat. The high and relatively constant 
value for the heat capacity throughout the range of tem- 
perature between 0 and 100 °C suggests that the hydro- 
gen-bonded network in the liquid is gradually and 
constantly deteriorating as the temperature is rising. 
Another indication of the extensive hydrogen- 
bonded structure in liquid water is its high static relative 
permittivity (£, = 88 at 0 °C), which is almost equivalent to 
that of ice Ih (£, = 99 at 0 °C). The large value for the rela- 
tive permittivity of ice Ih is usually explained semiquan- 
titatively* as a result of the high correlation among the 
orientations of the individual dipole moments of the 
water molecules caused by their rigid arrangement in the 
hydrogen-bonded lattice. When an electric field is 
applied, the dipole moments reorient cooperatively, pro- 
ducing the large relative permittivity. The fact that the 
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Figure 5-4: Separation of the observed isochoric heat capacity Cy 
of water (solid line) into calculated (dashed line) and configura- 
tional (shaded difference) components.’ The heat capacity of ice Ih 
was calculated from the two vibrational absorption bands of lowest 
energy (v = 840 cm™ and v= 230 cm); the heat capacity of water 
vapor was calculated from the vibrational, rotational, and transla- 
tional energies of the water molecules; and the heat capacity of the 
liquid was calculated on the assumption that each molecule in the 
liquid has three hindered degrees of translation and three hindered 
librations. Adapted with permission from ref 4. Copyright 1969 
Clarendon Press. 


liquid has almost the same relative permittivity as the 
solid indicates that much of the lattice remains. 

The molar volume of ice (19.6 cm? mol") is some- 
what greater than that of liquid water (18.0 cm? mol") at 
0 °C and much greater than the molar volume that would 
be expected if spheres the radius of molecules of water 
(0.14 nm) were randomly packed in an unstructured, dis- 
ordered array (10 cm? mol’).‘ The large molar volume of 
ice Ih is due to the vacant space created by the fact that 
oxygens are held in a tetrahedral array by the hydrogen- 
bonded network (Figure 5-2B). When ice melts, the mol- 
ecules of water are allowed to occupy some of the vacant 
space in the hydrogen-bonded lattice and the density 
increases. A related fact is that the molar volume of liquid 
water increases as the temperature is decreased below 
4 °C, presumably because the expansion caused by the 
strengthening of the hydrogen-bonded lattice is greater 
than the usual contraction experienced by most liquids 
resulting from the decrease in thermal energy. It is only 
above 4 °C that the latter effect becomes dominant. The 
contraction of water upon melting and the expansion of 
the liquid upon cooling below 4 °C are almost unprece- 
dented. Diamond, silicon, and germanium are tetrahedal 
solids that also float upon their melts, as ice floats upon 
water. Aside from these peculiar features, the molar vol- 
umes of ice and liquid water at 0 °C are both large and 
not that different from each other. Consequently, much 
of the vacant space created by the hydrogen-bonded lat- 
tice in the solid remains in the liquid. 

The fact that much of the vacant space remains in 
liquid water also explains the unique decrease in isother- 
mal compressibility that occurs in liquid water as tem- 
perature is raised from 0 to 50°C (Figure 5-5). The 
isothermal compressibility of a liquid, xy, is the fractional 
decrease in volume produced by the application of pres- 
sure at constant temperature: 
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Figure 5-5: Isothermal compressibility xr (gigapascals™) for 
liquid water at unit atmosphere (101.3 kPa) pressure, presented as 
a function of temperature.’ Reprinted with permission from ref 4. 
Copyright 1969 Clarendon Press. 


in units of reciprocal pressure (pascal). In almost every 
other liquid, isothermal compressibility increases 
monotonically with temperature. In liquid water at low 
temperatures, most of the structured vacant space of 
ice Ih remains when the transition from solid to liquid 
occurs, and this structured vacant space is gradually 
replaced with randomly distributed, unstructured vacant 
space, similar to that in other liquids as the temperature 
is raised. The high compressibility at low temperatures 
results from the ability of the lattice to decrease its 
volume upon the application of pressure at the expense 
of the significant vacant space among the oxygen atoms. 

The idea that liquid water at lower temperatures 
retains a structure similar to that of ice Ih is also sup- 
ported by the small cubic expansion coefficient of liquid 
water. Upon heating at atmospheric pressure between 
temperatures of 20 and 30 °C, other liquids expand about 
4 times more rapidly than does water (Figure 5-6).° As 
pressure is applied, however, the cubic expansion coeffi- 
cient for water increases while the coefficients of thermal 
expansion for other liquids decrease. At high pressures, 
both water and other liquids have about the same cubic 
expansion coefficient. If liquid water at atmospheric 
pressure is extensively hydrogen-bonded with an 
expanded structure similar to that of ice Ih (Figure 5-2B), 
then as the temperature is raised, the decrease in struc- 
tured empty volume due to the deterioration of this 
hydrogen-bonded network could almost cancel the 
increase in unstructured volume due to increased ther- 
mal motion. As pressure is applied, however, it causes 
the hydrogen-bonded network to deteriorate or restruc- 
ture and the liquid to have a more normal cubic expan- 
sion coefficient. In this view, the ability of pressure to 
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Figure 5-6: Cubic expansion coefficient for several liquids at 
25 °C as a function of applied pressure.® The liquids are (x) PCl, 
(O) CH;OH, (0) CS,, (@) C>H;Cl, (m) C,HyI, (A) C,H;OH, (V) C3;H,OH, 
(%) isobutyl alcohol, and (O) n-C;H,,OH. The solid curve is the 
cubic expansion coefficient for liquid water at 25 °C as a function 
of pressure. Reprinted with permission from ref 8. Copyright 1970 
American Association for the Advancement of Science. 
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change the structure of the liquid is due to the fact that 
liquid water is in an extensively hydrogen-bonded form 
at normal pressures but not at higher pressures. 

Such a transition between an ordered and a less 
ordered state caused by an increase in pressure would 
also explain why the application of pressure decreases 
the viscosity of liquid water rather than increasing it as 
it does the viscosities of other liquids.® The viscosity 
of water is anomalously large in the first place 
(n = 1.00 mPa s at 20 °C) compared to the viscosity of 
liquids such as acetonitrile (7 = 0.36 mPa s at 20 °C), 
pentane (n = 0.24 mPa s at 20 °C), and carbon disulfide 
(n = 0.36 mPa s at 20 °C). 

Additional evidence for the retention of a signifi- 
cant fraction of the hydrogen-bonded lattice in liquid 
water is provided by scattering of X-radiation. When a 
beam of X-radiation is passed through a liquid, it is 
scattered by the electrons of the molecules in the liquid. 
The intensity of the scattered X-radiation varies as a 
function of the angle between the incident beam and the 
direction at which the scattered radiation emerges from 
the solution. This angular dependence of the intensity 
can be used to calculate a radial molecular correlation 
function, Gy(r). This function is an approximation’ of 
the variation of electron density as a function of the 
radial distance from any one molecule in the liquid. The 
actual variation of electron density is distinguished from 
its approximation by designating it as g(r). The function 
Gu(r) registers any local variations in the electron density 
of the liquid, relative to the mean electron density of the 
liquid, that are maintained around any one of the mole- 
cules. Because it is a relative quantity, the value of Gy(7) 
is unity when the electron density is equal to the mean 
electron density. Any variations in density that are 
observed are assumed to be permanent features of the 
structure of the liquid. As Gy(7) or g(r) is proportional to 
the electron density as a function of radial distance from 
acentral molecule, they can be used to calculate the total 
number of molecules of solvent in a spherical shell 
between r; and r by the integral? 


Po f” 
i= | Arr? g(r) dr (5-4) 
A 
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where n, is the number of molecules of solvent in that 
shell, po is the bulk electron density of the liquid, and yis 
the number of electrons in each molecule of solvent (10 
in the case of water). 

The radial molecular correlation function for liquid 
water has been determined over a range of temperatures 
from 4 to 200 °C (Figure 5-7).”"’° At fairly long distances 
from any one molecule of water (>0.8 nm), the function 
becomes unity and does not vary noticeably. Therefore, 
beyond 0.8 nm from any given molecule the liquid is, on 
the average, homogeneous. A significant peak of density 
occurs, however, at 0.28 nm. Integration of this peak!" for 
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Figure 5-7: Molecular correlation functions for liquid water’? and 
ice Ih.!" The molecular correlation functions for liquid water at sev- 
eral temperatures (solid lines) were calculated from the angular 
dependence of the intensity of the scattered X-rays from samples of 
pure water through which a collimated beam of X-rays was passed. 
A molecular correlation function for liquid molecules arranged on 
the lattice of ice Ih (dashed line) was calculated from the length of 
the hydrogen bonds in ice Ih (0.276 nm) and the fact that the 
oxygen atoms lie upon a tetrahedral diamond lattice. The calcula- 
tion was performed with the assumption that the distributions of 
electron density around the maxima defined by the lattice could be 
approximated by error functions. The width of the first error func- 
tion was made the same as the width of the first maximum in liquid 
water at 4 °C, and the widths of the two subsequent error functions 
were made proportional to the square of their distances from the 
origin.’ Adapted with permission from ref 7, originally from ref 9. 
Copyright 1971 American Institute of Physics. 


the curve at 4°C, upon the assumption that it is a 
Gaussian function, indicates that it is produced by about 
four nearest neighbors. In ice there are four nearest 
neighbors to each water molecule and they are held at a 
distance of 0.276 nm. It can be assumed that these are 
retained in the liquid. That the peak is centered at a dis- 
tance so close to the hydrogen-bonded distance in ice 
has been interpreted to mean that each water molecule 
in the liquid has about four hydrogen-bonded nearest 
neighbors. 

A radial molecular correlation function can be cal- 
culated’ for liquid molecules of water confined to the 
tetrahedral lattice of ice Ih (Figure 5-7). In ice Ih, there 
are four nearest neighbors at 0.276 nm, 12 next neigh- 
bors at 0.45nm, and 12 farther neighbors at 0.52 nm 


(Figure 5-2). A distribution of liquid molecules of water 
confined to the diamond lattice of ice Ih would produce 
a radial molecular correlation function with a distinct 
minimum between the first four neighbors and the next 
group of 24 (Figure 5-7).'° 

When the radial molecular correlation functions of 
liquid water at 4 °C and liquid molecules of water con- 
fined to the lattice of ice Ih are compared, several differ- 
ences are noted. Although still a prominent feature, the 
maximum in ice Ih centered at around 0.5 nm is consid- 
erably broadened in the liquid. This indicates that the 
hydrogen-bonded network has become considerably 
more elastic in water than in ice, permitting the second 
and third groups of neighbors to approach the molecule 
at the origin much more closely, rather than being held 
at a distance by a rigid lattice. There also seems to be too 
much electron density in the actual liquid between the 
first maximum and the second maximum.” This has 
been interpreted to mean that molecules are able to 
break out of the lattice and become interstitial mole- 
cules of water, transiently occupying the vacant spaces 
(Figure 5-28). 

So far the discussion has emphasized similarities 
between ice Ih at 0 °C and liquid water at low tempera- 
tures. There are, of course, remarkable differences. The 
most obvious is the fact that ice is a solid and water is a 
liquid. Even though ice Ih is a solid, however, it, like 
liquid water, is able to flow. In order for condensed 
matter to flow, layers of molecules in that matter must be 
able to slide past layers of other molecules above and 
below them. In the case of water or ice Ih the manifesta- 
tion of this ability requires extensive and simultaneous 
disruption of continuous layers of hydrogen bonds in 
the liquid or solid as it flows. This capacity to flow is far 
more evident in water than in ice. It is quantified by 
values for the viscosity of the liquid and the solid. Liquid 
water at 0 °C has a viscosity of 1.8 mPa s, and ice Ih at 
0 °C has a viscosity of about 10'° mPa s.'*"* The difference 
between liquid water and ice Ih is so large because, to 
flow, hydrogen bonds must be broken simultaneously 
over significant regions. There are, however, measure- 
ments that quantify the behavior of individual molecules 
of water. 

When individual molecules in water change their 
relative positions, hydrogen bonds must be broken and 
re-formed elsewhere. The capacity to change positions is 
reflected in the process of self-diffusion, a measure of the 
rate at which the average molecule of water diffuses 
through a condensed phase of water molecules. The self- 
diffusion coefficient for ice Ih is about 10™ cm? s” at 
0 °C, and for liquid water it is 1.4 x 10° cm? s! at 5 °C.* 
This difference of 10° demonstrates that water molecules 
can exchange their hydrogen-bonded neighbors far 
more rapidly in the liquid than in the solid. To the extent 
that this exchange involves breaking and making of 
hydrogen bonds, the hydrogen bonds in the liquid are 
weaker than those in the solid. 


An even more easily understood measurement of 
the rate at which a molecule of water can detach itself 
from the hydrogen-bonded network in the liquid in order 
to reorient itself is the dielectric relaxation of liquid 
water. The relative permittivity of a chemical substance 
is a function of the frequency of the alternating electric 
field used to measure it. Tabulated relative permittivities 
are usually static relative permittivities that are meas- 
ured with an alternating electric field with a frequency of 
alternation so low that the measured values may be con- 
fidently extrapolated to zero frequency. The low fre- 
quency of alternation allows the molecules in the 
substance more than ample time to align themselves, as 
far as they are able, with the electric field while the meas- 
urement is made. If the frequency of the applied field, 
however, is gradually increased, at some point the mole- 
cules in the substance are unable to invert their align- 
ments at rates sufficient to keep up with the alternations 
of the applied field. Their inability to keep up results 
from intermolecular forces that hinder their rotation. 
The dielectric relaxation time is the time that an applied 
field must be in operation before exp(-1) of the increase 
in relative permittivity due to the rotation of the mole- 
cules aligning themselves with the field has occurred. 
The dielectric relaxation time of ice Ih at 0 °C is 2 x 10° s, 
that of liquid water at 0 °C is 2 x 10™ s, and that of a water 
molecule in a dilute solution of water in benzene is 
1x 10"? s.* A similar value for the rotational correlation 
time of a water molecule in ice Ih (1.5 x 10° s at -6 °C)” 
has been measured by nuclear magnetic resonance. 
Although a water molecule in liquid water is constrained 
so that it rotates 20 times more slowly than it does in a 
condensed phase lacking hydrogen bonds, it rotates 10° 
times faster in liquid water than in ice. This again 
demonstrates that the hydrogen bonds between water 
molecules in liquid water are weaker than those in ice. 

This weakening of the hydrogen bonds in the liquid 
is also reflected in a shift that occurs in the frequency of 
the maximum infrared absorption of the oxygen-hydro- 
gen stretching vibration of water when it melts.’ The fre- 
quency at which a covalent bond absorbs infrared 
electromagnetic energy is correlated with its bond 
energy. The greater the bond energy, the higher the fre- 
quency of the light required to excite its vibration. In the 
case of the stretching frequency of the oxygen-hydrogen 
bond in water, the stronger the hydrogen bond in which 
it participates, the weaker will be the covalent oxygen- 
hydrogen bond itself and the lower the frequency of its 
absorption. The stretching frequency of the oxygen- 
hydrogen bond in ice Ih is 3220 cm, in liquid water it is 
3490 cm", and in the dilute vapor it is 3700 cm. In the 
dilute vapor, no hydrogen bond weakens the oxygen- 
hydrogen bond. In ice Ih, a strong, fixed hydrogen bond 
weakens the oxygen-hydrogen bond significantly. In 
liquid water, less than half the decrease in frequency 
between the vapor and icelh occurs, presumably 
because the hydrogen bonds formed when the vapor is 
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converted into the liquid are weaker than the hydrogen 
bonds formed when the vapor is converted into the solid. 
The weakening of the hydrogen bonds upon melt- 
ing that is indicated by both the increases in self-diffu- 
sion and dielectric relaxation and the increase in the 
stretching frequency of the oxygen-hydrogen bond 
requires that the dissociation constant for the hydrogen 
bond in liquid water be significantly larger than that for 
ice Ih. This increase in dissociation constant upon melt- 
ing may be large enough to produce a significant popu- 
lation of unbonded molecules of water in the liquid, 
presumably the interstitial water the existence of which 
is implied in the radial molecular correlation function. 
There are infrared spectra of liquid water which 
suggest that there are two distinct species of 
oxygen-hydrogen bonds in the liquid,'° and these two 
species could represent intact and broken hydrogen 
bonds.” It is possible to fit the temperature dependence 
of both these infrared spectra and the heat capacity of 
the liquid with a model of the liquid that assumes that 
there are only two types of oxygen-hydrogen bonds 
present, those participating in intact hydrogen 
bonds and those the hydrogen bonds of which are 
broken.” From such a fit, the standard free energy of 
formation of a hydrogen bond in liquid water is esti- 
mated to be -2.0 kJ mol! at 25 °C; and the fraction of 
broken hydrogen bonds, 0.30 at 25 °C. From this fraction 
it would follow that at a given instant about 3-4% of the 
molecules of water would be either attached to the lattice 
by only one hydrogen bond or completely free of the lat- 
tice. There are, however, results suggesting both that 
these infrared spectra do not result from only two popu- 
lations of oxygen-hydrogen bonds? and that no simple 
two-state model can explain both the cubic expansion 
coefficient and the temperature coefficient of the 
isothermal compression of liquid water simultaneously.’ 
Consequently, the question of the molar concentration 
of intact hydrogen bonds in liquid water remains open. 
The mental picture of liquid water that forms intu- 
itively as its peculiarities are described is presently more 
adequate than any sophisticated physical model of its 
structure. The impression that is formed from a consid- 
eration of these properties is that liquid water retains 
most of the hydrogen bonds that are present in ice Ih but 
that these hydrogen bonds are more elastic, weaker, and 
break and re-form much more rapidly than those in ice Ih. 


Suggested Reading 


Eisenberg, D., & Kauzmann, W. (1969) The Structure and Properties 
of Water, Clarendon Press, Oxford, England. 


Problem 5-1: The isopiestic heat capacity ofa substance, 
Cp is defined as the amount of heat required to raise one 
mole of the substance one degree in temperature at con- 
stant pressure. The units of this quantity are joules 
degree’ mole". 
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A substance has a certain intrinsic enthalpy at 0 K, and 
this intrinsic enthalpy increases as the temperature 
increases and the substance absorbs heat: 


T 
Hy - Hy -[ ar + AN, 


where Hpis the intrinsic enthalpy at T= T, Hp is the intrin- 
sic enthalpy at T = OK, and AH,, is the sum of the 
enthalpy changes for all phase transitions between 0 K 
and T. The heat capacity of H,O is the following function 
of temperature (Figure 5-3): 


C,= (0.172JK~ mol)T [T = 0-60K] 
C,= 2.47 J K7} mol”! + (0.129 J K7? mol-!)T 
[T = 60-273K] 

C,= 775JK!mol”! [T = 273-373K] 
and 

AH,,, = 6.0kJ mol”! an" 

_ -1 o 
AH ap = 40.7 kJ mol at 100 °C 


(A) Use these experimental data to draw a graph of 
Hy — Ho as a function of temperature from 0 to 
375 K. 


The intrinsic entropy of a substance is related to the heat 
capacity by the following equation: 


EC 
Sa = 8) -| get Aën, 
0 


where Syris the intrinsic entropy at T= T, Sp is the intrin- 
sic entropy at T= 0, and AS,, is the sum of the entropy 
changes for all phase transitions. 


When phase transitions occur, AG° = 0 at the transition 
temperature. Use AH%,, and AH°vap to calculate AS°tus 
and AS°\ap. 


(B) Draw a graph of Sr- Sp as a function of T. 


(C) If Gr- G= Hr- Ho- SrT + SoT, are changes in G, as 
the temperature changes, greater than or less 
than changes in H in the case of HOH 


Standard States and Units of Concentration 


Whenever standard entropy or its representative, stan- 
dard free energy, are calculated from experimental 


observations, a decision must be made on the standard 
state to be used. Unlike those of the standard enthalpy 
and those of the heat capacity, the numerical values of 
both standard entropy and standard free energy depend 
significantly on this choice of standard states.'*’? When 
dealing with reactants and products dissolved in solu- 
tion, such as molecules of proteins and their ligands in 
water or alkanes in hexadecane, the choice of standard 
state, other than the obvious conventions of standard 
temperature and pressure, is a choice of the units in 
which the concentrations of the reactants and products 
are to be expressed. The desire in choosing the units for 
the concentrations is to eliminate any contributions to 
the entropy arising simply from the act of dispersing the 
solutes in the solvent and inescapably from the volumes 
of the solutes and the solvent. These contributions are 
the entropy of mixing. The reason for eliminating 
entropy of mixing is that the entropy that remains is 
the entropy of only the reaction itself and changes in the 
entropy of solvation that accompany the reaction. 

It can be assumed, as seems reasonable, that the 
thermodynamic activity of benzene should be the same 
whether it is dissolved in octane, decane, dodecane, 
tetradecane, or hexadecane. It has been shown experi- 
mentally” that this assumption is valid only if the 
thermodynamic activity of benzene is expressed in units 
of corrected volume fraction as defined by the 
equation’? 


anj = Yna j Iaj exp[1 = (il ) (5-5) 


where a, is the thermodynamic activity of solute A (ben- 
zene in the experiment) when it is dissolved in solvent j 
(octane, decane, dodecane, tetradecane, or hexadecane 
in the experiment), Yay is the activity coefficient neces- 
sary to convert real behavior into ideal behavior, Vj; is 
the volume of a mole of solute A when it is dissolved in a 
solution with solvent j, V; is the volume of a mole of sol- 
vent in the solution, and d, is the volume fraction of 
solute A in the solution with solvent j: 


Ny Vaj 


aj = = [A] Vi (5-6) 


Ny Ka + ni V; 


where ny and nj are the moles of solute A and solvent j, 
respectively, in the solution and [A] is its molar concen- 
tration.* Most measurements of activity are performed in 
such a way that the activity coefficient %4; is insignifi- 
cantly different from 1 or becomes 1 by extrapolation. 
That the thermodynamic activity of a solute should be 


* One must remember that molarity is defined as moles liter’ and 
the volume of a mole of a substance is defined as centimeters? 
mole), 


defined by Equation 5-5 was predicted theoretically?’ 


before it was verified experimentally. Expressing activi- 
ties of solutes by using Equation 5-5 can be thought of as 
correcting the concentration of the solute in units of 
mole fraction for the differences in the volumes of solute 
and solvent because when the volumes of a mole of 
solvent and a mole of solute are the same and y,,A,j is 1, 
Equation 5-5 becomes 


a (5-7) 
A: = = a = 
Aj Aj 

Ny + n; 


where zu is the mole fraction of solute A in solvent j. 
Equation 5-7 is Raoult’s law. 

The difficulty with the corrected volume fraction is 
deciding what volume to use for the volume of a mole of 
solute in the solution. When the solute is a liquid dis- 
solved miscibly in a nonpolar liquid, the molar volume of 
the solute, Vw which is the volume of a mole of the pure 
liquid solute, is a reasonable choice. If, however, the 
solute is a gas or a solid at the temperature of the meas- 
urement, its volume in the solution may be quite differ- 
ent from its volume at the temperature or pressure 
required to liquify it. In water, even a solute the pure 
phase of which is a liquid at the temperature of the meas- 
urement may have a volume in the solution that is sig- 
nificantly different from its molar volume. In the present 
discussion, the partial molar volume of the solute at infi- 
nite dilution has been chosen as an approximate esti- 
mate of the volume of a mole of the solute in the solution. 
Unlike the partial molar volume, however, which is only 
an estimate of the volume of a mole of the solute in the 
solution, the actual volume of a mole of the solute in the 
solution does not vary with the concentration of that 
solute. 

The partial molar volume (centimeters? mole”) of 
solute A or solvent j is defined as the increase in the 
volume of the solution, dV, that occurs when an infini- 
tesimally small number of moles, dn, of solute A or sol- 
vent j is added to the solution 


vr (5-8) 
on TP 


at the concentrations of solute and solvent at which the 
measurement is made. If the solvent and solute are both 
hydrocarbons, it is usually assumed that the partial 
molar volumes of solvent and solute are 


e M 


where M is the molar mass (grams mole”) and p is the 
density (grams milliliter") of the pure phase of the sol- 
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vent or the solute and Vp is the molar volume.” Equation 
5-9 is also used to estimate the partial molar volumes of 
other solvents, including water, when solutions are 
dilute. 

The partial molar volumes of most solutes when 
they are dissolved in water are significantly less than 
those defined by Equation 5-9.4 If the solute is a 
hydrocarbon, it occupies a significant portion of the 
empty space already present in the water (Figure 5-2). 
Direct measurement of partial molar volumes of hydro- 
carbons in water have rarely been performed, usually 
because such solutes are poorly soluble in water. In the 
absence of such measurements, the algorithms of 
Traube” are used to estimate partial molar volumes of 
hydrocarbons in water and those of most other solutes as 
well. 

Traube concluded from direct measurement that 
the partial molar volume of any neutral solute (centime- 
ter’ mole”) when it is dissolved in water is the sum of the 
partial molar volumes of its atoms and functional groups 
plus the covolume, which is a universal correction. The 
partial molar volumes of the atoms and functional 
groups at 25 °C are for hydrogen, 3.1 cm? mol”; carbon, 
10.0 cm? mol”; nitrogen, 1.5 cm? mol’; the oxygen in an 
ether, 5.5 cm? mol’; a hydroxyl group (-OH), 5.4 cm? 
mol’; the oxygen in an amide, thioester, ketone, or 
aldehyde (=O) 5.5 cm? mol’; an acyl group (-COO-), 
15.9cm? mol’; phosphorus, 17.1 cm? mol’; sulfur, 
15.6 cm? mol"; chlorine, 13.3 cm? mol; bromine, 
17.8 cm? mol"; and iodine, 21.6 cm? mof), From the sum 
of the partial molar volumes of its atoms and functional 
groups, 8.2 cm? mol! must be subtracted for each mono- 
cyclic ring, either saturated or unsaturated, and 26.6 cm’ 
mol” for each bicyclic aromatic ring, such as a naphthyl 
group. To the final sum for the constituents of a particu- 
lar molecule, a covolume of 12.5 cm? mol! must be 
added. 

When ions are dissolved in water, their charge con- 
stricts the solvent in their vicinity. These electrostrictions 
vary between -10 and -30 cm? mol” for the addition of 
salts of monovalent ions or monovalent zwitterions.” 
How electrostriction is to be treated in estimating the 
volumes of ions to be used in calculating their corrected 
volume fractions is unclear. The volumes of their ionic 
solids, however, are also poor estimates of their volumes 
in solution. 

When a reaction takes place in solution, its equilib- 
rium constant can be defined by use of units of corrected 
volume fraction (Equation 5-5), which have the conven- 
ient advantage that they are dimensionless. For example, 
the equilibrium constant for the association 


A+BHcC (5-10) 


occurring in solvent j when activity coefficients can be 
ignored, would be 
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Ac,j 
Keq = eege = 
dar pi 
[C] Ve; Vaj + VB; Woch 


= ex) 
ABI Va, ee 


(5-11) 


If the reaction proceeds with no change in volume 


and 
K = foj m [c] Vajt Voi exp (-1) 
Sa ay j Op: [A] [B] Vaj Ve i 
(5-13) 


Because the terms to the right of the quotient of the 
molar concentrations are not significant functions of the 
concentration of reactant and product, the quotient of 
the molar concentrations is a constant, namely, the equi- 
librium constant that is usually measured when units are 
molarity. But if entropy of mixing is to be eliminated, the 
equilibrium constant in units of corrected volume frac- 
tion (Equation 5-11) should be used for the calculation of 
the standard free energy of the reaction: 
AG’ = -RTInK,, (5-14) 
If the two molecules that are associating are identical— 
for example, the association is the formation of a hydro- 
gen bond between two molecules of phenol—the 
quotient of partial molar volumes in Equation 5-13 
becomes 2V, where A refers to one of the two identical 
molecules; if a small molecule, such as a ligand, associ- 
ates with a macromolecule, such as a protein, the quo- 
tient of the partial molar volumes becomes V, 1, where A 
refers to the ligand. These two situations, where both mol- 
ecules are the same and where one is much larger than 
the other, respectively, are the limits on the quotient: 


1 Vajt Vey 2 
= SS oo < — 
Va Vaj Vg j Va 


(5-15) 

j j 

where solute A is the smaller of the two molecules. 
Another reaction in which the choice of standard 

state and units of concentration significantly affects the 

actual value of the standard free energy is the transfer of 

solute A from water to solvent j: 


A(H20) == A(solventj) (5-16) 


An aqueous phase and a phase of another solvent, for 
example hexadecane, are placed in contact with each 
other directly or indirectly. The solute of interest, for 
example benzene, is added to the system at low concen- 
tration, and its partition between the two phases is 
allowed to reach equilibrium.” The concentration of the 
solute in each phase is measured, and a partition coeffi- 
cient, Kpa is calculated. Although the concentration of 
solute A in each of the two phases is initially tabulated in 
units convenient to the method of measurement,” for 
example grams of solute (grams of solvent)”, units for 
the concentrations used to calculate the standard free 
energies of transfer, and hence the definition of standard 
state, has, as always, a significant effect on the magnitude 
of the standard free energy of transfer. If it is assumed 
that corrected volume fractions are the proper units and 
also that activity coefficients can be ignored, the parti- 
tion coefficient for the transfer of solute A from water to 
solvent j is 


ayj Daj 


K = = 
pA a d 
AHA AHA Vio 


Kpa ann 
AG = lim -RTn| E 
AHO 7 — N u 
A Ajj 
[A] Va Ho Kaz 
lim -RTln exp | — 7 
AAO H,O Vio V; 


The limit defines the standard state as the solutions at 
infinite dilution, a condition at which both activity coef- 
ficients are 1 and can definitely be ignored. 

The goal of the rather complex definitions of 
activities and hence standard states that has just been 
described is to arrive at a value for the standard free 
energy of transfer that is only the difference between 
the standard free energy of solvation for solute A by 
solvent j and the standard free energy of solvation for 
solute A by water.'® The use of Equation 5-5 for ther- 
modynamic activities eliminates the contributions of 
the entropies of mixing to the standard free energy of 
transfer. The quotient Va nal Vaj in Equation 5-18 cor- 
rects for the work performed when the volume of the 
system increases as solute is transferred from water to 
solvent j. In order to focus only on standard free ener- 
gies of solvation, the conditions must be such that 
solute A is surrounded entirely by either molecules of 
water or molecules of solvent j, and no molecule of 


either solute A or the molecules of water or solvent that 
surrounds it is affected in its behavior by the presence 
of another molecule of solute A. This is the reason for 
the limit in Equation 5-18, which defines the standard 
state of infinite dilution. In this way, the only contribu- 
tions to the difference in standard free energy of solva- 
tion are the specific interactions between the molecule 
of solute A and the solvent j or the water. 

The choice of units of concentration and standard 
state is also critical in calculating the transfer of a 
solute from the gas phase to a solution. The usual 
choice of standard state for the solution in such a reac- 
tion is the solute at infinite dilution in the solvent so 
that the solute is fully solvated and no interactions 
occur among the molecules of solute. The usual choice 
of standard state for the gas is the real gas extrapolated 
to zero pressure in order to eliminate the nonideal 
behavior of the real gas represented by its virial coeffi- 
cients. Because of the proportionality between molarity 
and pressure, the practical units of concentration for a 
gas are usually pressure, but the thermodynamic activ- 
ity of the gas should be defined as its molarity."® 

To avoid both the standard entropy of mixing and 
changes in volume at constant pressure during the trans- 
fer of solute from the gas phase to a solution, the volume 
occupied by a mole of the solute in the gas phase would 
have to be equal to the volume occupied by a mole of the 
solvated solute in the solution.'*“* Consider a large 
volume of solution at standard state in contact with a 
large volume of the gaseous solute. Only when the pres- 
sure of the gas is such that 1 mol of gaseous solute has the 
same volume as the partial molar volume of the solute in 
the solution will the volume of the system not change 
when 1 mol of solute is transferred from the gas to the 
solution at constant pressure. Only under these circum- 
stances is the transfer both isochoric and isobaric. As a 
result, the standard entropy of mixing is 0, no work is per- 
formed by the system, and the transfer occurs at constant 
pressure. To achieve this condition, the gas must be 
compressed mathematically to a volume equal to the 
partial molar volume of the solute in the solution. The 
standard free energy change for the compression of the 
gaseous solute to a volume equal to its partial molar 
volume in the solution is 


AG* =-RTin([A], Vaj) 6-19) 


compression 


where [A], is its molar concentration in the gas phase 


u Pa 
8 2.479 kPa L mol“! 


(5-20) 


where p, is the partial pressure (pascals) of the gaseous 
solute A at 25 °C. When all of these considerations are 
combined with Equation 5-5," the equation for the stan- 
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dard free energy of transfer of solute A from the gas phase 
to a solution in solvent j, AG°, e A becomes 


Daj Vaj 
AGa gj = ~RTIn| ——— exp|1 - “]|= 
[A], Va; 
-RTIn d exp} 1 - — 
(5-21) 


Again, the intention of Equation 5-21 is to apply the 
appropriate corrections so that the standard free energy 
of transfer is only the standard free energy of solvation 
for solute A by solvent j. 


Ionic Interactions 


The possibility that a positively charged cation might 
interact favorably with a negatively charged anion and 
bring two molecules or two segments of the same 
polypeptide together has a lasting appeal. Such an asso- 
ciation seems plausible because, as everyone knows, 
unlike charges attract each other. When a positive ion in 
a solution encounters a negative ion and a complex 
between these two ions is formed, it is referred to as an 
ion pair. In terms of Equation 5-1, a hydrated ion pair 
forms whenever a hydrated anion associates with a 
hydrated cation. In this reaction, the various changes of 
standard free energy identified in Equation 5-2 can be 
separately considered by writing the following thermo- 
dynamic cycle: 


AGS +.p- 
At(g) + Brig) — En AtB (g) 
+ 4 + 
x H20 y H20 z H20 
AG yaat) |AG hya(B 5) AG’ hyd(a +-B7) 


A*IH20)x + B-(H:20} — z At: B-(H20); + (x+ y- z)H20 
IP 


(5-22) 


It is easiest to consider the changes of standard 
enthalpy first because they constitute the main contri- 
bution to the changes in standard free energy and they 
have been measured least ambiguously. The standard 
enthalpy change when a positive ion and a negative ion 
associate in the gas phase, AH°,+.g-, is governed by elec- 
trostatics. To the extent that the Born-Haber cycle is able 
to provide accurate estimations of crystal lattice energies 
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only from simple electrostatic theory,” the standard 
enthalpy of formation of an ion pair in the gas phase 
from the two separated ions, A‘ and P, if electron repul- 
sion is ignored, should be 


AH ga = (2 ezt- N =) 
vp (Zu) (Zp) ea late | a (5-23) 


where zy and Zp- are the charge numbers of the ions, e, is 
the elementary charge (1.602 x 10° C), and ay and ge 
are the radii of the two ions. Values for the ionic radii in 
crystalline lattices, based on crystallographic studies of 
salts,” are usually used for ay and ap.” The standard 
enthalpy of formation defined by Equation 5-23 can be 
presented for monovalent ions (Zy = —Zp- = 1) as a func- 
tion of the sum of the two ionic radii (Figure 5-8). 

When an ion is transferred from the gas phase to 
water, there is a large release of heat.* This large negative 
change in standard enthalpy is referred to as the stan- 
dard enthalpy of hydration, Ham, Measurements of 
these standard enthalpies of hydration have been tabu- 
lated’ for a number of monovalent, divalent, and triva- 
lent spherical ions. The values for the spherical 
monovalent cations and anions can be presented as a 
function of their ionic radii (Figure 5-8). 

The large negative standard enthalpies of hydra- 
tion for ions are commonly explained to be the result of 
the ability of the fixed charge on an ion to gather around 
itself a layer of tightly held molecules of water that are ori- 
ented either with the positive ends of their dipoles, their 
hydrogens, toward an anion or the negative ends of their 
dipoles, their lone pairs, directed toward a cation (Figure 
5-9). This explanation is probably incorrect. From meas- 
urements of the standard enthalpy of formation for com- 
plexes between monovalent cations in the gas phase and 
1-7 molecules of water, it has been concluded” that 
about four molecules of water are sufficient to hydrate a 
cation such as NH", HO", H,COH', Li’, or Na*. This result 
suggests that the innermost shell of the layer of hydration 
around an ion is not large. Furthermore, when the stan- 
dard enthalpy changes for the formation of 1:1 complexes 
between a molecule of water and various cations and 
anions in polar nonaqueous solvents were determined, 
the values observed were quite small (0 > AH° > -13 kJ 
mof These two results suggest that the large standard 
enthalpies of hydration observed for ions arise far more 
from the influence exerted by the ion over a significant 
region of the water surrounding it than from the specific, 
intimate noncovalent contacts between the ion and its 
immediate neighbors. 

In electrostatics the self-charging energy is the 


* This large release of heat when any ion is transferred from the gas 
phase to water should not be confused with the small releases or 
absorptions of heat that occur when the ions in the solid crystals of 
a Salt are dissolved in water. 
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Figure 5-8: Electrostatic enthalpies and standard enthalpies of 
hydration. The standard enthalpy change for bringing together a 
monovalent cation and a monovalent anion in a vacuum is pre- 
sented as a function of the sum of the two respective ionic radii 
(solid dark line), as calculated from Equation 5-23. The standard 
enthalpies of hydration’ for monovalent cations (O) are presented 
as a function of their ionic radii. The ions are, in order of increas- 
ing radius, Li*, Na*, K*, Rb*, and Ce", The line connecting the points 
is drawn by hand. The standard enthalpies of hydration‘ for mono- 
valent anions (O) are presented as a function of their ionic radii. 
The ions are, in order of increasing ionic radius, F, Cl, Br, and T. 
The line connecting the points is drawn by hand. The standard 
enthalpy change for the hydration of a monovalent ion of either 
charge, based on the assumption that the standard enthalpy of 
hydration is due only to the difference in self-charging energies in 
the vacuum and in water (Equation 5-25), is presented as a func- 
tion of ionic radius (dark dashed line). All enthalpies are presented 
in kilojoules mole” for a standard temperature of 25 °C. 
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Figure 5-9: Schematic drawing of molecules of water with the 
negative ends of their dipoles directed toward a cation and the pos- 
itive ends of their dipoles directed toward an anion. 


energy required to charge a sphere of a given radius a in 
a medium of relative permittivity ¢,.°* The self-charging 
energy, E,., for placing the charge Ze, on an ion j of radius 


aj would be 


et N (5-24) 


The standard enthalpy change AH", associated with the 
electrostatic energy required to move an ion from the 
vacuum (£= 1) to water (€,= 78) at 25 °C would be the dif- 
ference in the two self-charging energies: 


o 1) N, (5-25) 


This standard enthalpy change for amonovalent ion can 
be presented as a function of ionic radius (Figure 5-8). It 
can be seen that, for a sphere of unit elementary charge, 
the value of the difference in self-charging energy 
between the vacuum and a medium the relative permit- 
tivity of which is equal to that of water is close to the 
experimentally observed standard enthalpy of hydration 
for a spherical anion (O) of the same radius. 

The observed standard enthalpies of hydration for 
the cations, however, are less than the values expected 
from differences in self-charging energies. The large 
differences between AH°,. and A "wd have been 
explained as being due either to increases in the “effec- 
tive radius” of the cations or to decreases in the 
“effective dielectric constant around the ion.”** Which 
of these views is more realistic is unknown. 
Nevertheless, self-charging can explain the shapes and 
slopes of the curves connecting the experimental values 
for standard enthalpies of hydration and the majority 
if not all of the absolute value of each. Because 
Equation 5-25 accounts for the majority of the standard 
enthalpy of hydration for simple ions and because it is 
the large bulk relative permittivity of water that causes 
the result to be so large, it follows that it is the bulk 
dielectric of the water, rather than local interactions, 
that is responsible for the large standard enthalpies of 
hydration. 

The bulk relative permittivity of any solvent is a 
measure of the macroscopic response of that solvent to a 
fixed electrostatic charge. Consequently, in the calcula- 
tion of Equation 5-25, a solvent is treated as a uniform 
continuum and no account is taken of the properties of 
its individual molecules. The high relative permittivity of 
water, however, is actually a result of the cooperative 
behavior of the molecules of water in the liquid over a 
significant volume. Because the high relative permittivity 
of water arises from the correlation of the individual 
dipoles of the molecules of water, the necessity to rely on 
that relative permittivity to explain the large enthalpy of 
hydration means that an ion influences the structure of 
the water over a significant distance, not just in its imme- 
diate vicinity. 

The standard enthalpy of hydration for a monova- 
lent ion pair, AH®,,q(4+.35, can be estimated from electro- 
static theory just as standard enthalpies of hydration 
were estimated. It should be equal to the difference 
between the sum of the standard enthalpies of charging 
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the two ions separately and the standard enthalpy of 
bringing them to within a certain distance of each other: 


es ell, 1 2 1 
ET ae Ges de &,H,0 


(5-26) 


where dr is the distance between the monovalent ions 
in the ion pair; when the two ions are as close together as 
possible, dr equals ga, + ap-. It can be shown that 


0.828 1 1 2 1 
< t 5 < (5-27) 
An+ t Ap- Ay 


where a, is the radius of the smaller of the two monova- 
lent ions. It follows that the standard enthalpy of hydra- 
tion for a monovalent ion pair is slightly less than the 
standard enthalpy of hydration for the smaller ofthe two 
ions alone.” 

The overall change in standard enthalpy for the for- 
mation of an ion pair in aqueous solution should be 


Alum = AH Tun: + AH ‘hyd(at-B-) = 


AH hyata) — AH hyd(B-) 
(5-28) 


From an examination of Figure 5-8, it becomes clear that 
for monovalent ions this change in standard enthalpy is 
a small difference between several large numbers, and its 
value could be either positive or negative. This conclu- 
sion, that the standard enthalpy change has a small 
value, is supported by estimating the electrostatic energy 
involved in bringing two monovalent ions of opposite 
charge together in a medium with a uniform relative per- 
mittivity equal to that of water: 


2 
e 
AH = | l 


N, (6-29) 
Er,H,O \ Aa + 4p- 


For ay > 0.1 nm and ge 2 0.1 nm, -9 kJ mol! < Al < 
0 kJ mol”. 

These changes in standard enthalpy do demon- 
strate quite clearly why an ion pair sequestered in the 
middle of a folded polypeptide would be unstable rela- 
tive to the separated hydrated ions in solution. The only 
reason an ion pair is almost stable in aqueous solution is 
that there is considerable standard enthalpy of hydration 
for the ion pair itself, AH°,,4(4.35- In the center of a pro- 
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tein, this standard enthalpy of hydration would not be 
exerted and the ion pair would be much less stable. There 
will never be sufficient electrostatic energy in the ion pair 
alone to overcome the large negative standard enthalpies 
of hydration that are lost when the separated ions are 
removed from water during the folding of the protein. 
This fact can be verified by examining Figure 5-8. The 
total standard enthalpy of hydration lost is the sum of 
the two values for the individual enthalpies of hydration. 
The standard enthalpy of association gained is that 
for the sum of the two ionic radii. The former is always of 
a greater magnitude than the latter. 

The standard entropies of hydration,” in marked 
contrast to the standard enthalpies of hydration, are 
small. Values of the entropies of hydration for a series of 
small monovalent ions of either charge lie between -67 
and +21J K! mol! when the two standard states are 
chosen as the molten salt at one mole fraction in the ion 
and the ideal solution at one mole fraction in the ion.” At 
298 K, these standard entropies of hydration would 
cause the standard free energies of hydration to differ 
from enthalpies of hydration by less than 4%, certainly 
less than the error in the estimation of enthalpies of 
hydration from experimental data.’ 

The small standard entropies of hydration seem at 
first glance to be inconsistent with the formation of a 
region of oriented water around an ion, which is the 
explanation given for the large standard enthalpies of 
hydration. The apparent inconsistency is usually 
explained by noting that the region of oriented molecules 
of water surrounding either an anion or a cation cannot 
merge flawlessly with the hydrogen-bonded lattice of the 
bulk water. Therefore, there must be an outer spherical 
shell of disorder between the inner sphere of order and 
the order of the lattice beyond the influence of the ion. 
The negative standard entropy change of forming the 
sphere of oriented water should be canceled by the pos- 
itive standard entropy change of forming this outer shell 
of the disordered transition.’ 

Explanations ofthe large enthalpies ofhydration and 
the small entropies of hydration both predict that an ion 
will affect the structure of the water well beyond the few 
molecules in its immediate vicinity. Direct evidence for 
such an extended region of oriented water comes from 
measurements of the repulsion of hydration.°®” When 
two identical surfaces that have dense arrays of both neg- 
ative and positive ions spread over them—both, however, 
in exactly equal concentration so that each of the two sur- 
faces is electrostatically neutral—are brought together in 
water, a repulsive force between the surfaces is evident. 
This repulsive force becomes significant when the two 
surfaces come within about 2nm of each other and 
increases in magnitude exponentially as the distance is 
decreased. It has been proposed that this repulsive force 
is the resistance of the layers of hydration around the ions 
on each surface to their interpenetration. That this repul- 
sion of the layers of hydration extends out from each sur- 


face by at least 1 nm indicates that each ion orders the 
waters around it over a significant distance.* 

In the case of molecules of protein, the ion pairs 
that have received the most attention are those that 
would form between the carboxylate ion of an aspartate 
or a glutamate and the ammonium ion of a lysine or the 
guanidinium ion of an arginine. The association constant 
in water” for the ion pair between an acetate ion and an 
ammonium ion is around 0.5 M”, and that for the ion 
pair between an acetate ion and a guanidinium ion is 
somewhat less than 0.5 M™. Consequently, the concen- 
tration of either an ammonium or a guanidinium cation 
would have to be greater than 2 M for half of the acetate 
anion in the solution to be complexed with it. These weak 
interactions have free energies of formation of around 
-3 kJ mol’ when the concentrations are expressed in 
units of corrected volume fraction. They are probably the 
result of hydrogen bonding between the anion and 
cation rather than ionic interactions, because the com- 
plex is stronger for ammonium than guanidinium and 
there is no evidence that small monovalent cations and 
anions that lack donors and acceptors of hydrogen 
bonds associate to form ion pairs in water. 

Several additional observations demonstrate that 
ion pairs between an ammonium ion and a carboxylate 
ion are unstable relative to the separated ions. The 
dielectric increment is the change in the relative permit- 
tivity of a solution with the concentration of an added 
solute. The dielectric increments for a series of 
zwitterionic amino acids containing an ammonium and 
a carboxylate, namely, glycine, 3-aminopropionate, 
4-aminobutyrate, 5-aminopentanoate, and 6-amino- 
hexanoate, have been measured. The values display a 
monotonic increase with the number of methylenes 
between the positively charged ammonium ion and the 
negatively charged carboxylate ion. The values observed 
are in agreement with theoretical calculations of their 
magnitude from a simple model in which the distance 
between the elementary positive charge and the elemen- 
tary negative charge is determined only by random, 
unbiased rotation around the carbon-carbon bonds 
connecting them.” Were the formation of an ion pair 
between an ammonium cation and a carboxylate anion a 
favorable interaction in aqueous solution, this regularity 
could not have occurred. In glycine, an intramolecular 
ion pair cannot form. In 3-aminopropionate and 
4-aminobutanoate, excellent intramolecular ion pairs, 
forming rings five and six atoms in size, should form even 
more readily than a similar intermolecular ion pair. If 
these intramolecular ion pairs were able to form, how- 


* The distance of this repulsive force (1 nm) is about the distance 
(0.7 nm) calculated for the decrease in the electrostatic field 
around a univalent ion in water to a potential energy equal to KT. If 
an ion significantly influences the water around it to a radius of 
about 1 nm, this region of its influence would contain about 100 
molecules of water. 


ever, the dielectric increments of these two amino acids 
should both be less than that of glycine, yet no anomaly 
is observed in their dielectric increments compared to 
the other compounds within the complete series. The 
behavior of the interaction volumes for the same series of 
amino acids also shows no evidence of peculiarities that 
would result from intramolecular ion pairing.” 

There is no steric hindrance to the formation of an 
ion pair between the ammonium ion of the lysine in 
the peptide N“-acetyl-WLKLL and its carboxy terminus, 
and such an intermolecular ion pair forms readily when 
the peptide is dissolved in octanol. When the peptide is 
dissolved in water, however, no ion pair can be 
detected.“ 

Although ion pairs do not have net favorable stan- 
dard free energies of formation in aqueous solution and 
do not contribute to the stability of a folded polypep- 
tide, the electrostatic repulsion of amino acids of like 
charge can destabilize a particular structure. This dis- 
tinction is illustrated by the effect of ionic strength on 
the stability of coiled coils of o helices.*' A coiled coil of 
ahelices is a stable structure that forms when two 
ahelices coil around each other in a supercoil. This 
supercoil can stabilize the two o helices sufficiently that 
they can form in water. Few isolated o helices are stable 
in aqueous solution, but a coiled coil is one way to cir- 
cumvent this problem. A series of peptides designed to 
form coiled coils were synthesized chemically. One of 
the peptides (naa = 30) had glutamates at the positions 
flanking its hydrophobic core; the other (na = 30) had 
arginines flanking the core. The stability of the het- 
erodimeric coiled coil formed from a positively charged 
peptide and a negatively charged peptide was not 
affected by changing the ionic strength, and this result 
indicated that electrostatic interactions such as ion pair- 
ing were not contributing to the standard free energy of 
formation of that coiled coil. The stabilities of homo- 
dimers formed from either two of the positively charged 
peptides or two of the negatively charged peptides, how- 
ever, decreased significantly as the ionic strength of the 
solution was lowered, and this result demonstrates that 
these complexes were destabilized by charge repulsion. 
It is this destabilization of the two homodimers, not the 
formation of ion pairs, that accounts for the fact that the 
heterodimer forms preferentially.’ 

Proteins are generally dissolved in aqueous solu- 
tions the ionic strengths of which are between 0.1 and 
0.3 M, and the effect of such ionic strengths on ionic 
interactions, although small, should be noted. The activ- 
ity coefficients of electrolytes decline sharply from a 
value of 1 at low concentrations to minimum values, 
depending on the salt, of from 0.05 to 0.8 at concentra- 
tions of about 0.3 M. When activity coefficients of 
solutes are less than 1, it means that the solute is behav- 
ing with a chemical potential less than it would have if it 
were an ideal solute at the same concentration, and its 
tendency to leave the solution is less than it should be. 
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The fact that, at low concentrations, the activity coeffi- 
cients of ions are near 1 means that as long as they are 
far enough apart their activities increase in proportion 
to their concentration as expected for any solute. As 
their concentrations become high enough that each ion 
begins to experience the presence of the others, the 
presence of the others decreases the tendency of that 
ion to leave the solution. This decreased tendency arises 
from the departure of all of the ions in the solution from 
a random distribution in such a way that a region 
enriched in counterions forms around each individual 
ion as expected of an ionic double layer (Equation 
1-71).” These enriched counterionic layers around each 
dissolved ion make each of them more stable in the 
aqueous solution than it would be if it were an ideal 
solute, and this is what causes the decrease in its activity 
coefficient. 

This effect of ionic strength makes the formation 
of an ion pair even less likely than it would be in the 
absence of added salt, because its formation would 
involve the diminishment of a considerable fraction of 
the counterionic layer around each separated ion. The 
presence of these ionic layers also makes it more diffi- 
cult to remove an ionic functional group from a solu- 
tion of moderate ionic strength than it would be to 
remove it from pure water. The activity coefficients for 
most ionic solutes are between 0.2 and 0.8 at the ionic 
strengths encountered in biochemical situations, and 
these values should lead to decreases in the standard 
free energies of hydration between -4 and -0.4 kJ mol, 
respectively.” 

Although ion pairs between simple monovalent 
cations and anions have positive standard free energies 
of formation, there are two situations in which ion pairs 
are favorable. Ion pairs involving divalent metal ions 
often have negative standard free energies of formation. 
For example, significant concentrations of the ion pairs 
Ca**-SO,* and Mg**.SO,” are present in aqueous solu- 
tions of the respective salts, and ion pairs between diva- 
lent cations such as Ba°*, Ca", and Mg” and hydroxide 
ion in aqueous solution show appreciable stabilities.” 
There are, however, no divalent side chains among the 20 
natural amino acids. Phosphorylated amino acids, such 
as serine phosphate (2-30), are divalent at high pH and 
can readily form ion pairs with divalent cations such as 
Ca", 

The other situation in which ion pairs become 
favorable is encountered when chelation can occur. 
Chelation is the binding of an ion to a molecule, the 
chelating agent. The chelating agent contains two or 
more functional groups of opposite charge to the bound 
ion that can simultaneously associate with it, or it con- 
tains two or more dipoles that simultaneously can be 
favorably directed toward the bound ion, or it contains 
some combination of such charges and dipoles. The 
paradigm of chelating agents is N,N,N’,N’-tetra- 
carboxymethyl-1,2-diaminoethane: 
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which can wrap its nitrogens and carboxylates around a 
divalent or trivalent metal ion and form an ion pair of 
high stability. It has already been mentioned that the 
binding of monovalent cations and anions by proteins 
is thought to involve particular binding sites that have 
advantageous dispositions of functional groups, often 
with charge opposite to the charge on the bound ion. 
Chelation, however, assumes a preexisting arrangement 
of two or more charged groups or dipoles that create a 
pocket within which an ion can be held, and this 
arrangement does not exist in an unfolded polypeptide 
or with isolated anions and cations in solution. 
Chelation could be important in forming an interface 
between two already folded polypeptides or binding a 
charged substrate to an already folded enzyme. 


Suggested Reading 


Parsegian, A. (1969) Energy of an ion crossing a low dielectric mem- 
brane: solutions to four relevant electrostatic problems, Nature 
221, 844-846. 


Problem 5-2: Calculate the standard enthalpy changes 
for the transfer of a sodium ion (ayx = 0.097 nm) and a 
chloride ion (oc = 0.181 nm), respectively, from the gas 
phase (Le, = 1) to water at 25°C (€,4,9 = 78 at 25 °C). 
Calculate the standard enthalpy changes for the forma- 
tion of an ion pair between a sodium ion and a chloride 
ion in the gas phase and for the transfer of that ion pair 
from the gas phase to water. By difference, calculate the 
standard enthalpy of formation of an ion pair between a 
sodium ion and a chloride ion in water. 


Problem 5-3: Consider the series of six compounds 
H,N*(CH,),COO’, where n = 1-6. Within each of these 
molecules there is a carboxy group and an amino group, 
which bear opposite charges at pH 7. The elementary 
negative charge on the carboxy group is located between 
the two oxygens. 


(A) For each compound, construct with molecular 
models the conformation of the molecule in 
which the positively charged nitrogen is posi- 
tioned as close as possible to one of the negatively 
charged oxygens. 


(B) In which of the molecules is it possible to form an 
ion pair that juxtaposes NH." and O? 


The dipole moment (u) of a particular molecular struc- 
ture that contains fixed charges is equal to the product of 


the magnitude of the charges ze, and the distance, r, that 
separates them: u = zje,r. In each structure you have 
made, Zjeais the same but r changes. 


(C) Examine the structures you have drawn and rank 
them in order of increasing dipole moment. 
Indicate ranking with the symbols < and =. 


The observed dipole moments for these molecules dis- 
solved in water at pH 7.0 are” 


n 1 2 3 4 5 6 
u 12D 15D 18D 20D 22D 24D 


(D) Explain why the actual dipole moments for these 
molecules fail to agree with the theoretical pre- 
dictions that you made in part C. 


The Hydrogen Bond 


A hydrogen bond is a noncovalent force that arises 
between an acid, known as the donor, A-H, and a base, 
known as the acceptor, ©B. The atoms A and Bin the case 
of proteins are the heteroatoms oxygen, nitrogen, and 
sulfur. A hydrogen bond is an intermediate on the trajec- 
tory of an acid-base reaction:“** 
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The two central complexes in Equation 5-30 are each 
held together by a hydrogen bond, but the overall reac- 
tion is the transfer of a proton between two lone pairs of 
electrons. An example of this relationship between a 
hydrogen bond and an acid-base reaction is the self-dis- 
sociation of water. In the liquid most of the water mole- 
cules participate in hydrogen bonds, and these hydrogen 
bonds are the intermediate steps in the production of a 
hydroxide anion and a hydronium cation. The anion and 
the cation are produced when the proton in a hydrogen 
bond between two molecules of water moves momentar- 
ily from donor to acceptor and the hydrogen bond 
between the new donor, the hydronium ion, and the new 
acceptor, the hydroxide ion, dissociates to yield the free 
species. 

Manifestations of hydrogen bonding are the alter- 
ations it effects in the physical and chemical properties 
of liquids, gases, and solids. The nonideal behavior of 
certain gases can be explained by the existence of hydro- 
gen-bonded oligomers of the molecules composing the 


gas. The water dimer (Figure 5-1) is an example of sucha 
situation; its existence lowers the pressure of water 
vapor. Abnormally positive values for the standard 
enthalpy of vaporization or abnormally negative values 
for the standard enthalpy of mixing can often be 
explained as the result of either the breaking of hydrogen 
bonds as the molecules depart the liquid or the forma- 
tion of hydrogen bonds as a donor and acceptor are 
mixed, respectively. When an acceptor is added to a solu- 
tion of a donor, the infrared spectrum of the resulting 
mixture often displays a new absorption band, at a lower 
frequency than the absorption of the A-H stretching 
vibration observed with the solution of the donor alone. 
This new absorption increases in magnitude in propor- 
tion to the amount of acceptor added, while the ampli- 
tude of the absorption of the unshifted stretching 
vibration of the A-H bond of the donor decreases in pro- 
portion. The new stretching vibration is assigned to that 
of the A-H covalent bond within a hydrogen bond 
between the donor and the added acceptor. A similar 
observation is made in nuclear magnetic resonance 
spectra of mixtures of donors and acceptors. In this case, 
two separate absorptions are not observed because the 
rates at which the hydrogen bonds are interchanging 
among the molecules in the solution are faster than the 
time resolution of the method, but the chemical shift of 
the proton participating in the A-H bond moves down- 
field until it reaches a maximum value, associated with 
the chemical shift of the proton within the hydrogen 
bond. 

Taken together, these commonly encountered 
observations demonstrate three features of a hydrogen 
bond. First, a hydrogen bond causes two molecules to 
associate with each other and form a complex that pre- 
vents them from changing their relative positions as 
readily as they would otherwise; in other words, it corre- 
lates their movements. Second, there is a release of heat 
associated with the formation of this complex. Third, the 
proton in the A-H covalent bond of the donor experi- 
ences a change in its environment during the formation 
of this complex. The results of both infrared and nuclear 
magnetic resonance spectroscopy are consistent with a 
lengthening of the covalent bond between A and H con- 
comitant with a movement of the proton away from the 
electrons of the o bond. 

The arrangement of the atoms in crystallographic 
molecular models of small molecules that display these 
physical manifestations of hydrogen bonding usually 
displays a pattern that can be assigned to the hydrogen 
bond itself. The positions in the unit cell of the atoms of 
the second and third periods of the periodic table, for 
example, carbon, nitrogen, oxygen, and sulfur, are deter- 
mined by X-ray crystallography, and the positions of the 
protons, often as deuterons, are determined most 
reliably by neutron diffraction. Whereas a proton has 
little ability to scatter X-rays, inasmuch as it has no core 
electrons, a deuteron scatters neutrons as readily as a 
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carbon or an oxygen; and deuterons are prominent 


features in maps of neutron scattering density, as 
opposed to hydrogens in maps of electron density. 
Furthermore, a proton has a negative scattering ampli- 
tude for neutrons while a deuteron has a positive scat- 
tering amplitude. This causes difference maps of neutron 
scattering density for deuteronated against protonated 
molecules to display sharp maxima where the protons 
are located in the former. 

In crystallographic molecular models of small mol- 
ecules known to be hydrogen-bonded, the bond is rec- 
ognized as an enforced orientation of the donor and 
acceptor (Figure 5-10).’” Associated with this orientation 
are certain bond lengths and bond angles.“° It is these 
bond lengths and bond angles that are the most impor- 
tant property of a hydrogen bond as far as the structures 
of proteins are concerned. The hydrogen bond provides 
no net standard free energy to the process of folding a 
polypeptide, but hydrogen bonds are responsible for 
aligning atoms and holding them at precise distances 
and constrained angles to each other in the folded struc- 
ture. The A-H o bond of the donor in a hydrogen bond is 
pointed at the heteroatom B of the acceptor. The dis- 
tance, d, between A and B is always less than it would be 
if the proton on the donor atom and the atom acting as 
the acceptor were simply in van der Waals contact. For 
example, in a hydrogen bond of the type O-HON 
(Equation 5-30), the distance between oxygen and nitro- 
gen is 0.28 + 0.01 nm, while the distance between 
carbon and nitrogen in a van der Waals contact of the 
type C-HON would be 0.35 nm. It is this shortened dis- 
tance between donor and acceptor that reflects the 
bonding. The bond lengths, d, for most types of sterically 
unconstrained hydrogen bonds (Table 5-1) between 
neutral donors and neutral acceptors lie between 0.25 
and 0.30 nm, but the bond angles are more variable. 

In general, the anglea between the axis of the 
hydrogen bond and one of the ocovalent bonds to 
the heteroatom of the donor, A (Figure 5-10A), will reflect 


Table 5-1: Length of Hydrogen Bonds“ 


A-H©B compounds average bond length? (nm) 
OHOO carboxylic acids 0.26 + 0.01° 
OHOO phenols 0.27 + 0.01 

OHOO alcohols 0.27 +0.01 

OHON all O-H 0.28 + 0.01 

NHOO ammoniums 0.29 + 0.01 

NHOO amides 0.29 + 0.01 

NHOO amines 0.30 + 0.01 

NHON all N-H 0.31 + 0.01 


“The values in this table are reproduced directly from tables in ref 46. With the 
exception of the hydrogen bonds involving ammonium cations, these are hydro- 
gen bonds between a neutral donor and a neutral acceptor. "These are the dis- 
tances between the heteroatoms, nitrogens or oxygens. “These standard 
deviations may be standard deviations of actual lengths or standard deviations of 
the measurement or both. 
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Figure 5-10: Relationships defining bond angles for hydrogen 
bonds.“ (A) In a simple hydrogen bond between a donor AH and 
an acceptor ©B, the line of center between heteroatoms A and B 
creates an axis, which is the axis ofthe hydrogen bond. The angle a 
is the angle between that axis and a o covalent bond to the het- 
eroatom A. The angle bis the angle between that axis and a ø cova- 
lent bond to the atom B. The distance d is the length of the 
hydrogen bond. (B, C) For a hydrogen bond between an amido 
nitrogen and a carbonyl oxygen or an acyl oxygen, the carbonyl or 
acyl group defines a plane. The bond angle b is the angle in the 
plane between the carbon-oxygen double bond and the projection 
of the axis of the hydrogen bond on the plane. In panel C, the lower 
dotted line shows the projection of the axis of the hydrogen bond 
on the plane. The bond angle c is the angle between this projection 
and the axis of the hydrogen bond. For a hydrogen bond between 
a nitrogen-hydrogen donor and the o covalent bond of a carbonyl 
oxygen and acyl oxygen, the bond angle b can vary over a range 
bounded by the two lone pairs of electrons on the oxygen if the 
oxygen is otherwise unoccupied. (E) In several instances, the axis of 
the o bond between the heteroatom of the donor and the proton 
lies between two lone pairs of electrons from two different het- 
eroatoms, and one donor interacts with two acceptors in a bifur- 
cated hydrogen bond. 


the hybridization of that heteroatom, while the angle b 
between the axis of the bond and one of the o covalent 
bonds to the heteroatom of the acceptor, B, although 
much more flexible than angle a, will tend to reflect the 
hybridization of the lone pair of electrons on atom B. 

The type of hydrogen bond that accounts for the 
majority of those in biological macromolecules, both 
proteins and nucleic acids, is the hydrogen bond between 
the sp’ lone pair on an acyl oxygen as an acceptor and the 
nitrogen-hydrogen bond of an acyl derivative such as an 
amide or an amidine as a donor (Figure 5-10B). From the 
crystallographic molecular models of 1500 intermolecu- 
lar hydrogen bonds between either a carbonyl oxygen or 
an acyl oxygen and such a nitrogen-hydrogen bond, the 
bond lengths and bond angles were compiled.“ For 
the collection of all such hydrogen bonds examined, the 
mean nitrogen-oxygen distance, d was 0.297 nm. If it is 
assumed that the acyl or carbonyl carbon and its three 
obonds define a plane and that the line of centers 
between the nitrogen and oxygen of the hydrogen bond 
defines a line, two angles define the hydrogen bond: 
angle b, the angle in the plane between the projection 
upon the plane of the line and the carbon-oxygen double 
bond (Figure 5-10B); and angle c, the angle that the line 
of centers between the nitrogen and oxygen makes with 
the projection of that line of centers on the plane of the 
acyl group (Figure 5-10C). Angle c determines how far 
the nitrogen is above or below the plane, and d sin cis the 
actual distance the nitrogen is above or below the plane. 
In the hydrogen bonds examined, the nitrogen atom was 
usually within 0.1 nm (sin c = 0.33) of the plane defined 
by the acceptor (Figure 5-11).“ If the carbonyl oxygen or 
acyl oxygen participates as the acceptor in two hydrogen 
bonds, there is a strong tendency for angle b to be 120° 
(Figure 5-11B), the angle expected from an sp” hybridiza- 
tion of its two lone pairs. 

If the carbonyl oxygen or acyl oxygen, however, par- 
ticipates as an acceptor in only one hydrogen bond, 
angle b will still show a slight preference for 120° but the 
angle can also assume other values between 120° and 
180° with almost equal facility (Figure 5-11A). It is as 
though the nitrogen-hydrogen bond can pivot over the 
electron cloud formed by both lone pairs when the other 
one of them is not forming another hydrogen bond 
(Figure 5-10D), and the location it eventually assumes in 
the crystal is determined by forces other than the hydro- 
gen bond itself. This apparent ability of a nitrogen donor 
to associate with two lone pairs on the same atom may be 
related to its ability to participate in a bifurcated hydro- 
gen bond” in which it associates with two lone pairs 
from separate atoms (Figure 5-10E). The fact that the 
nitrogen is not confined strictly to the plane of the acyl 
group (Figure 5-11), although it has a strong preference 
for that plane, demonstrates that the nitrogen-hydrogen 
dipole can also pivot up or down out of the plane about 
a single lone pair (Figure 5-10C) or about two lone pairs 
(Figure 5-11A). 
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Figure 5-11: Distribution of values for the angle b (Figure 5-10B) 
and the sine of angle c (Figure 5-10C) over a population of hydro- 
gen bonds between nitrogen-hydrogen donors and carbonyl 
oxygen acceptors or acyl oxygen acceptors observed in crystallo- 
graphic molecular models of small molecules.“ (A) Hydrogen 
bonds involving a carbonyl oxygen or acyl oxygen in which the 
oxygen atom accepts no other hydrogen bonds. (B) Hydrogen 
bonds involving a carbonyl oxygen or acyl oxygen in which the 
oxygen atom accepts one other hydrogen bond. In each of the four 
panels, the number of bonds falling within a range of values is plot- 
ted as the value of the ordinate. In the two left panels, the values on 
the abscissa defining the ranges are values of the sine of angle c 
(sin c). In the two right panels, the values on the abscissa defining 
the ranges are values of the angle b in degrees. Adapted with per- 
mission from ref 47. Copyright 1983 American Chemical Society. 


All of the observations presented in Figure 5-11 are 
for acyl oxygens that are not in carboxylates. In the case 
of the oxygens in carboxylates, such as those on the side 
chains of aspartate and glutamate, the tendency for the 
nitrogen to reside in the plane of the carboxylate is less- 
ened and the tendency for angle b to assume 120° is 
increased.” Although it has been proposed that a syn pair 
of electrons on a carboxylate (Equation 2-12) should be 
more basic than an anti pair, no preference is shown for 
one over the other in forming hydrogen bonds in crystal- 
lographic molecular models of small molecules“? or pro- 
teins.” 

The shift in frequency of the infrared absorption for 
an oxygen-hydrogen stretching vibration has been used 
to examine the effect of stereochemistry on the strength 
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of intramolecular hydrogen bonds.” A series of 
pyridines substituted at the o-position by —CH,OH, 
—CH,CH,OH, or -CH,CH,CH,OH were examined. These 
functional groups can form hydrogen-bonded rings with 
the pyridine nitrogen that are four, five, or six atoms in 
size, respectively, when the proton is not counted. All 
three compounds display an absorption that could be 
assigned to a shifted OH stretching vibration. The hydro- 
gen bonds increase in strength, as indicated by the shift 
in wavelength of the OH stretching frequency (Av = 192, 
203, and 357 cm”, respectively), as the ring becomes 
larger. The ring with six atoms, the only one large enough 
to permit the hydrogen bond to be linear, display the 
largest frequency shift, an observation consistent with its 
being the strongest. This result suggests that, in a cyclic 
hydrogen-bonded structure, the hydrogen bond should 
be considered as a somewhat longer and more flexible 
covalent bond between the two heteroatoms, and the 
proton should not be counted as one of the atoms in the 
ring. 

There has been some disagreement over the ability 
of sulfur to participate as an acceptor in a hydrogen 
bond because of the poor overlap between its atomic 
orbitals and those of nitrogen or oxygen. In a survey of 
crystallographic structures for a number of compounds 
in which nitrogen donors and sulfur acceptors both 
appear,” juxtapositions were frequently observed and 
these were consistent with hydrogen bonds of the type 
NHOS. The most telling observation in favor of the exis- 
tence of such hydrogen bonds was the fact that the nitro- 
gen-sulfur distances (0.33-0.35 nm) were shorter than 
the distance expected from purely van der Waals contact. 

The pairs of electrons in z bonds in a simple olefin 
or a phenyl ring are less basic than the o lone pairs on the 
acyl oxygen in a secondary amide (pK, = -0.5)” or the 
oxygen of a molecule of water (pK, = —1.7). The values of 
pk, for the conjugate acids of ethene and propene are 
-24.3 and -19.3, respectively. The values of pK, for the 
conjugate acids in which a carbon in the ring is proto- 
nated are -24.3 for benzene, -16.3 for benzofuran, —10 for 
3-hydroxy-5-methyltoluene, -7.8 for 3-hydroxyphenol, 
-5.8 for 3,5-dihydroxytoluene, and -3.1 for 3,5-dihy- 
droxyphenol.’"” From these values, the values of pK, for 
the conjugate acids of a phenylalanyl side chain and a 
tyrosyl side chain, in which a carbon in the ring is proto- 
nated, should be around -20 and -13. The phenyl ring of 
a tryptophanyl side chain should have a pK, somewhat 
greater than that of a tyrosyl side chain. The differences 
between the values of pK, for donor and acceptor in the 
hydrogen bonds between two water molecules or 
between two molecules of N-methylacetamide, however, 
are already large, around 17-18 units, so it would not be 
surprising if the differences in pK, required to use one of 
these aromatic side chains as an acceptor, although even 
larger, would still permit the formation of a hydrogen 
bond. 

There are indications that hydrogen bonds can 
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form between the zclouds of aromatic rings as accep- 
tors and biologically relevant donors.” In the complex 
between water and benzene in the gas phase, the water 
sits upon the zcloud with the positive end of its dipole 
oriented towards the ring and its two hydrogens lie 
0.1 nm closer to the plane of the ring than van der Waals 
contact should allow.” All of these features suggest that 
a hydrogen bond has been formed. There are crystallo- 
graphic studies of other complexes that also suggest that 
hydrogen bonds between a hydroxyl group and the 
z electrons of an aromatic ring do form,” and theoret- 
ical calculations suggest that a hydrogen bond between 
an amido nitrogen-hydrogen and a phenyl ring could be 
as much as half as strong as a normal hydrogen bond 
between an amido nitrogen-hydrogen and a o lone pair 
of electrons.” 

Associated with any hydrogen bond are two wells of 
potential energy (Figure 5-12), the well of potential 
energy for a proton within the lone pair of the donor and 
the well of potential energy for a proton within the lone 
pair of the acceptor. As there is only one proton between 
the donor and the acceptor, only one of the two wells is 
occupied at any given instant. When the proton is 
located in a particular well, it is participating in a cova- 
lent bond with the heteroatom to which the well belongs. 
When in that covalent bond the proton cannot have an 
energy less than that of the lowest or first vibrational 
energy level, and it is usually occupying that level. 
Because energy is quantized, the energy of the first vibra- 
tional level is above the bottom of the well of potential 
energy. The energy of the first vibrational level is the 
zero-point energy of the bond to which the well applies. 
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Figure 5-12: Overlap of wells of potential energy for the covalent 
bonds between the proton and the heteroatom of the donor (left 
panel) and between the proton and the heteroatom of the acceptor 
(right panel) in a hydrogen bond. In a case where the values of pK, 
for donor and acceptor are matched, the zero-point energies (thin 
horizontal lines) are the same. (Left panel) If the distance between 
donor and acceptor is long, the intersection of the two wells of 
potential energy is above the zero-point energy and there is a bar- 
rier to transfer of the proton between the wells (arrow pointing to 
the left). The proton divides its time between the wells and the two 
mean positions it assumes are separated by dyy, the distance 
between the bottoms of the two wells. (Right panel) If the distance 
between donor and acceptor is short, the intersection between the 
two wells is below the zero-point energy and the barrier to transfer 
between the wells (arrow pointing to the left) is no longer effective. 
The proton (H) is found halfway between donor and acceptor. 


In the separated donor and acceptor, these two wells of 
potential energy are also present and are the wells of 
potential energy associated with protonating the accep- 
tor or protonating the conjugate base of the donor. As the 
donor and acceptor approach each other, these wells of 
potential energy overlap. The point of their intersection 
(Figure 5-12) is the height of the barrier of potential 
energy that must be crossed if the proton is to be trans- 
ferred from donor to acceptor (Equation 5-30). The more 
closely the heteroatom of the donor and the heteroatom 
of the acceptor approach each other, the lower will be 
this barrier. 

The difference in the zero-point energies between 
the well for the donor and the well for an oxygen-hydro- 
gen bond in the hydronium ion is the standard enthalpy 
change associated with the pK, of the donor; the differ- 
ence between the zero-point energies of the well for the 
conjugate acid of the acceptor and the well for the hydro- 
nium ion is the standard enthalpy change associated 
with the pK, of the acceptor; and the difference in zero- 
point energies of the well for the donor and the well for 
the acceptor is the standard enthalpy change associated 
with the difference in pK, (ApK,) between them. If the 
difference in pK, between donor and acceptor is small, 
the two wells of potential energy will have about the 
same minimum; or better yet, if the acceptor is the con- 
jugate base of the donor, the two wells of potential 
energy will be mirror images of each other. In such situ- 
ations, as the heteroatoms are brought closer together, 
the barrier between them decreases rapidly until it is 
equal to or less than the zero-point energy (Figure 5-12). 
When this occurs, the two wells become continuous, as 
far as the proton is concerned, even though there are still 
two minima of potential energy. Hydrogen bonds in 
which the distance between the heteroatoms of donor 
and acceptor approaches but does not necessarily reach 
this point at which the barrier vanishes are low-barrier 
hydrogen bonds. 

In a hydrogen bond in which the distance between 
donor and acceptor has become short enough that the 
barrier has vanished, the two wells have become one and 
the proton necessarily occupies a position midway 
between the two heteroatoms.® A number of such 
hydrogen bonds have been observed by neutron diffrac- 
tion in the crystalline state. When the same hydrogen 
bond in which the proton is found to be centered in the 
crystalline state is formed in solution, however, the 
proton is usually not centered™ because solvation of 
the bond is more favorable for the situation in which the 
proton is closer to one of the heteroatoms than to the 
other. It is as if solvation has recreated the barrier, 
probably by biasing the relative energies of the occupied 
and unoccupied wells at a given instant even if they are 
identical when unoccupied. When the proton is trans- 
ferred to the other heteroatom, the change in solvation 
causes the levels of the wells to switch. Because water 
strongly solvates dipoles, a barrierless hydrogen bond in 


which the proton is centered between the heteroatoms 
and in which the distinction between donor and accep- 
tor has disappeared rarely if ever exists in water P 

The length of the covalent bond A-H between the 
proton and the heteroatom of the donor is longer when 
it is in a hydrogen bond than when it is not. In a series of 
hydrogen bonds of the same type (Figure 5-13), regard- 
less of whether they are intermolecular or intramolecular 
examples of the class,” as the distance dun between the 
two heteroatoms in a hydrogen bond decreases, the 
length of the bond between the proton and the het- 
eroatom of the donor increases” from its length when 
it is not hydrogen-bonded (horizontal dashed lines) until 
the bond becomes so short that the proton sits halfway 
between donor and acceptor (Figure 5-12). There are 
several physical measurements that register this increase 
in the length of the bond between the proton and the het- 
eroatom of the donor as the bond shortens. 

The movement of the proton away from the het- 
eroatom of the donor that occurs as the hydrogen bond 
is formed decreases the electron density of the covalent 
bond surrounding that proton, and this deshielding 
shifts its peak of absorption downfield in a nuclear mag- 
netic resonance spectrum. In a series of hydrogen bonds 
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Figure 5-13: Length of the bond between the proton and the 
oxygen atom of a donor [d(O-H)] in a hydrogen bond between two 
oxygens as a function of the distance between the oxygen atom of 
the acceptor and the proton [d(H®0)].® Crystallographic molecu- 
lar models of complexes containing either intermolecular or 
intramolecular hydrogen bonds between two oxygens were 
retrieved from the Cambridge Crystallographic Database. The 
types of complexes collected were carboxylic acid-carboxy- 
lates (O), metal oximes (0), inorganic acid salts (x), hydronium 
hydroxyls (+), B-diketone enols (m), carboxylic-carboxylics (A), 
alcohols (0), and ice Ih (A). The dashed diagonal line in the upper 
left-hand corner is drawn for d(O-H) = d(H©O). In the shortest 
hydrogen bonds, d(O-H) does equal d(H®O), the proton sits 
halfway between donor and acceptor, and donor and acceptor are 
indistinguishable. As the hydrogen bond increases beyond a length 
of about 0.24 nm, the proton is closer to the more basic oxygen, and 
donor and acceptor become distinguishable. The horizontal 
dashed lines indicate the range of the values for the length of an 
oxygen-hydrogen bond in an isolated, non-hydrogen-bonded 
molecule in the gas phase. 
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of the same type, as the hydrogen bond becomes shorter 
and the proton moves farther away from the heteroatom 
of the donor and becomes even more deshielded, its 
chemical shift becomes even larger. An absorbance in 
nuclear magnetic resonance spectroscopy between 16 
and 24 ppm for a proton demonstrates that it is in a low- 
barrier hydrogen bond. For example, chemical shifts of 
20.5 ppm for the proton between the two oxygens in 
hydrogen maleate monoanion (5-2), 
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of 16.1 ppm for the proton between the two oxygens of 
the enol of 2,4-dioxopentane (5-3), and 18.5 ppm for 
the proton between the two nitrogens in hydrogen 
1,8-diamino-N,N,N’,N’-tetramethylnaphthalene mono- 
cation (5-4), each measured in organic solvents,® indi- 
cate that these are low-barrier hydrogen bonds, as do 
their lengths (0.241, 0.243-0.251, and 0.258 nm, respec- 
Huel TF"? 

When the donor enters into a hydrogen bond and 
its A-H bond becomes longer, the force constant for its 
stretching vibration becomes smaller and the frequency 
at which it absorbs infrared light becomes lower than 
when it is not in a hydrogen bond. Consequently, the 
peak of absorption for the A-H bond of the donor when 
it is in a hydrogen bond appears in the infrared spec- 
trum at a lower frequency than the peak of absorption 
for the free A-H bond of the donor. The existence of these 
two distinct peaks of absorption allows the concentra- 
tions of bonded and unbonded donor to be quantified 
(Problem 5-7). In a series of hydrogen bonds of the same 
type, as the hydrogen bond becomes shorter and the A-H 
bond becomes longer, the stretching frequency of the 
A-H bond decreases. 

The fractionation factor ¢ is the equilibrium con- 
stant defined by 


_ [ADOB] [L,O©HOL] 
- [AH@B][L,O@DOL] 


(5-31) 


where H is protium, D is deuterium, and L is either pro- 
tium or deuterium. ADOB is the hydrogen bond between 
the deuterated donor and the acceptor, LJOOHOL is a 
hydrogen bond between two molecules of water in which 
a proton is within the hydrogen bond, AHOB is the 
hydrogen bond between undeuterated donor and accep- 
tor, and L,O@DOL is a hydrogen bond between two mol- 
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ecules of water in which a deuteron is within the hydro- 
gen bond. A fractionation factor of less than 1 indicates 
that a proton has a greater preference than a deuteron for 
sitting in the hydrogen bond being examined, relative to 
the preferences of proton and deuteron for sitting in a 
hydrogen bond between two molecules of water. The 
fractionation factor scales the relative preferences of 
proton and deuteron for any hydrogen bond to their rel- 
ative preferences for the reference hydrogen bond 
between two water molecules, much as the acid dissoci- 
ation constant scales the basicity of any lone pair of elec- 
trons to the basicity of a lone pair of electrons on a 
molecule of water. The fractionation factor is measured 
by following the concentration of the protonated form of 
the hydrogen bond of interest as the mole fraction of H,O 
is varied in mixtures of H,O and DO. 

In a series of hydrogen bonds of the same type, as 
the distance between the heteroatoms of donor and 
acceptor decreases, so does the fractionation factor.” 
This decrease states that as the hydrogen bond becomes 
shorter, the proton has a greater and greater preference 
for its occupation relative to that of a deuteron. A value of 
less than 1 indicates that the hydrogen bond is a short, 
low-barrier hydrogen bond. The fractionation factors for 
the hydrogen bonds in hydrogen maleate monoanion 
(5-2) and hydrogen 1,8-diamino-N,N,N’,N’-tetramethyl- 
naphthalene monocation (5-4) are 0.84 and 0.90 in 
water. "HI? The fractionation factor for aqueous FHF, 
which contains one of the shortest hydrogen bonds, is 
0.60. Fractionation factors for hydrogen bonds in 
organic solvents, however, can be as small as 0.4.°' 

The chemical shift, the stretching frequency, and 
the fractionation factor all monitor the length of the 
hydrogen bond. They do not, however, provide any indi- 
cation of its strength. 

The strength of a hydrogen bond is expressed in 
thermodynamic parameters. The standard enthalpy of 
formation, or the heat released when the bond forms, is 
a measure of the electronic strength of the bond. It is 
usually the property that is referred to when the strength 
of the bond is discussed indiscriminately. The standard 
free energy of formation determines the degree to which 
the hydrogen bond will be favored over the unbonded 
reactants. Its magnitude is complicated by the fact that it 
is a function of both the standard enthalpy of formation, 
the electronic term, and the standard entropy of forma- 
tion, the quantitative measure of the total change in dis- 
order occurring during the reaction. The standard 
entropy of formation is usually a negative term because 
order increases when hydrogen bonds are formed. It is 
also affected significantly by changes in solvation. In 
addition, the standard entropy of formation depends on 
the choice of units for concentration because of the 
entropy of mixing. The association equilibrium constant, 
which is usually the quantity that is directly measured, is 
connected directly to the standard free energy of forma- 
tion, not to the standard enthalpy of formation. 


Ordinarily, the values of these thermodynamic 
properties are obtained systematically.“ A method of 
measurement, such as infrared spectroscopy, is used to 
provide values for the molar concentration of free donor, 
[HA], the molar concentration of free acceptor, [BO], and 
the molar concentration of hydrogen bonds, [BOHA], in 
a solution. The total concentrations of donor and accep- 
tor are systematically varied at a given temperature, and 
the experimental association equilibrium constants are 
measured for each set of concentrations: 


_ [BOHA] 7 
Kyns = [Bol[HAl (5-32) 


From the association equilibrium constant at a particu- 
lar temperature converted into the proper units 
(Equation 5-13), the standard free energy of formation of 
the hydrogen bond can be calculated (Equation 5-14). 
The variation of the equilibrium constant with tempera- 
ture is determined experimentally, and from these obser- 
vations, the standard enthalpy of formation of the 
hydrogen bond can be calculated: 
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Finally, the standard entropy of formation is calculated 
from the experimental results by the relationship 


_ AH -AG (5-34) 


The standard enthalpies of formation for hydrogen 
bonds between uncharged donors and acceptors of bio- 
logical interest (Table 5-2) lie between -12 and -23 kJ (mol 
of bond)! when the donor and acceptor are dissolved in 
organic solvents such as CCl, or benzene. In spite of these 
favorable standard enthalpies of formation, the equilib- 
rium constants for the formation of the complexes in a 
similar set of hydrogen bonds, disregarding those that 
involve two hydrogen bonds, are quite small when 
expressed in units of molarity’ (Table 5-3). When 
expressed in units of corrected volume fraction (Equations 
5-5 and 5-13) to eliminate entropy of mixing, the values 
are somewhat larger. The small magnitude of these values 
results from the fact that the negative standard enthalpy 
of formation is canceled to a considerable degree by a neg- 
ative standard entropy of formation because even though 
the correction for volume fraction takes care of the entropy 
change involved in their finding each other, the two mol- 
ecules still must reach the proper relative orientations so 
that the bond can form. Even in the best of circumstances, 
a hydrogen bond is a weak interaction. 

The standard enthalpy of formation of a hydrogen 
bond is a function of the difference in pK, between the 


Table 5-2: Standard Enthalpies of Formation for a Series 
of Biochemically Important Hydrogen Bonds in Organic 
Solvents“ 


hydrogen bond solvent AH? (kJ mol") 
OOHO 
H3C < hs CCl, 45° 
OHOO 
9 H 
Hat "Oo-CH3 GE -20 
H 
CH3 
( O NO Hd CCl, -13 
H 
OOHO CCl, -18 
CCl, -21 
(QO NOHO 
H3CH2C-N 
HH 
© C2Hs neat -15 
N 
gH 
OU 
CH3 
DOHN benzene -15 
H3C CH3 
N-CH 
H 
H, l 
H ©H-O neat -14 
H 


“Values are copied directly from tables in ref 46. ” Value for formation of the two 
hydrogen bonds of the dimer. 


donor and the acceptor.” For example, a reference 
donor, p-fluorophenol, was chosen, and a series of 
acceptors were used to form hydrogen bonds with this 
donor in CCl, at 25 °C:8%83 


Kang 


BO + HOC,H,F === BOHOC,H,F (5-35) 
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The correlation between standard enthalpy of formation 
and pK, for these hydrogen bonds was" 


AH’ sup = (-1.3 KJ mol") pK, + b; (5-36) 


where b; (kilojoules mole”) assumes different values for 
each category of base examined; for example, b; has dif- 
ferent values for carbonyl oxygens, pyridines, and pri- 
mary amines. Within a particular category, however, the 
standard enthalpies of formation are linearly related by 
Equation 5-36.* This correlation states that as the 
acceptor becomes a stronger base (as the pK, of its con- 
jugate acid becomes larger), the hydrogen bond 
becomes stronger (the standard enthalpy of formation 
becomes more negative). 

A closely related correlation also exists between the 
standard free energy of formation and the pK, for hydro- 
gen bonds between p-fluorophenol and various bases of 
different pK,:°” 


AG = (-1.3 KJ mol") pK, + Cj (5-37) 


where c; (kilojoules mole) again assumes a different 
value for each category of bases. The standard enthalpy 
of formation and the standard free energy of forma- 
tion® of a hydrogen bond are also linearly related to the 
pk, of the donor in similar experiments in which a 
common acceptor and a systematic series of donors 
were used (Figure 5-14).°” 

It has been observed that when the infrared stretch- 
ing frequencies ofthe A-H bond in a particular donor are 
examined as a function of the values of pK, for the con- 
jugate acids of a set of acceptors, the stretching frequen- 
cies of the donor, which monitor the strength of the A-H 
bond, decrease linearly as the acceptor becomes a 
stronger base HOH This observation is consistent with the 
fact that as an unconstrained, intermolecular hydrogen 
bond becomes stronger, it also becomes shorter. 

If one considers the situation of a hydrogen bond in 
which the acceptor remains the same and a series of 
donors of increasing acidity is examined, the standard 
enthalpy of formation of the hydrogen bond will 
decrease linearly as the pK, of the donor decreases until 


* This correlation is an example of the common practice of relating 
thermodynamic properties to the acid dissociation constant. When 
the correlation is between the respective values of pK, the loga- 
rithms ofa set of equilibrium and constants (Figure 5-14), the slope 
is a dimensionless number referred to as the Bronsted coefficient. 
When the correlation is between the values of pK, and enthalpies 
or free energies, the slope of the correlation is the Bronsted coeffi- 
cient multiplied by 2.303RT (kilojoules mole”). When the correla- 
tion is between the values of pK, and entropies, the slope of the 
correlation is the Bronsted coefficient multiplied by 2.303R (kilo- 
joules mole” kelvin”). The units on the slope should be consistent 
with the comparison being made. 
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Table 5-3: Association Constants of Hydrogen Bonds 


hydrogen bond solvent temperature (°C) Kyng (MT Kaug (evi)? 
OOHO 
Hsc— ya benzene 30 130 1680 
OHOO 
OOHO CH3 
g Ze ` benzene 30 430 3420 
H3C OHOO 
-OH 
H3CH2C '©g-C2Hs5 CCl 25 0.64 8.1 
H 
H 
OOHO CCl, 21 2.3 19 
CCl, 20 55 480 
(ONOHO 
H3C 
OC2H5 
CCl 25 1.7 17 
OHOO i 
H3CH2C 
(0) 
\—chs 
OOH N benzene 25 6.2 60 
H3C CH3 
N-CH3 
H 


“Values are copied directly from tables in ref 46. Values for the the association constant given in the dimensionless units of corrected volume fraction (Equation 5-13). 


the pK, of the donor is equivalent to the pK, of the con- 
jugate acid of the acceptor. If the acidity of the donor is 
increased further, the proton will be transferred between 
donor and acceptor, the conjugate acid of the former 
acceptor becomes the new donor, and the conjugate 
base of the former donor becomes the new acceptor. If 
the pK, of the former donor is decreased below the pK, of 
the conjugate acid of the former acceptor, the standard 
enthalpy of formation for the hydrogen bond, because 
donor and acceptor have switched roles, will begin to 
increase. A corresponding argument could be made for 
the situation in which the donor remains the same and a 
series of acceptors, the conjugate acids of which increase 
in pK,, is examined. From these considerations and the 


fact that AG°ayp tracks AH app, it follows that the strength 
of the hydrogen bond, as measured by its association 
equilibrium constant Kar, is determined by the differ- 
ence in pK, between the donor and the conjugate acid of 
the acceptor (Figure 5-14); the smaller the difference, the 
stronger the bond. If this is the case, it must follow that 
the strongest possible hydrogen bond in a given series is 
the one in which the pK, of the donor is equal to the pK, 
of the conjugate acid of the acceptor. The most obvious 
examples of such a hydrogen bond are those between a 
donor and its conjugate base. 

It is important to note, however, that such a sym- 
metric hydrogen bond, even one between a donor and 
its conjugate base, is no stronger than would be pre- 
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Figure 5-14: Association constants Kays, in tetrahydrofuran for 
the hydrogen bonds between either 3,4-dinitrophenolate ion (O) or 
4-nitrophenolate ion (©) (5-5) and a series of phenols acting as 
donors as a function of the difference in the values of pK, (ApKa,y,0) 
for the donor and the conjugate acid of the acceptor.” In tetrahy- 
drofuran, the absorptions of both the 3,4-dinitrophenolate ion and 
the 4-nitrophenolate ion shift from 420 to 388 nm upon formation 
of a hydrogen bond. As donor is added to the solution, the 
absorbance at 420 nm decreases and that at 388 nm increases with 
a clear isosbestic point. The molar concentrations of free donor 
([HA]), free acceptor ([BO]), and hydrogen bond ([BOHA]) were cal- 
culated from the ratio of the two absorbances (see Problem 5-7). 
The logarithms of the association constants (Kays) in units of 
molarity” calculated in this way are plotted as a function of the dif- 
ference between the respective values of the pK, (ApK, 4,9) as meas- 
ured in water. These values of ApK,y, . are expected to be 
proportional to those in tetrahydrofuran. 


dicted by the relationship between the difference in pK, 
and the association constant Ku (as indicated by the 
points on the ordinate in Figure 5-14). A similar contin- 
uous increase in the strengths of hydrogen bonds has 
been observed in the gas phase as the difference in the 
proton affinities of the donors and acceptors decreases 
again, however, with no evidence for a discontinuity 
when they are matched.” It has also been observed that 
the downfield shift of the proton in the hydrogen bonds 
between 1-methylimidazole and a series of carboxylic 
acids passes through a maximum with no obvious dis- 
continuity as a function of the pK, of the carboxylic 
acids.” 

There are at least two ways that the distance 
between a donor and acceptor in a particular type of 
hydrogen bond can be shortened. If the hydrogen bond 
is an intermolecular one, anything that increases its 
strength will cause it to shorten. If the hydrogen bond is 
an intramolecular one, the structure of the molecular 
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scaffold supporting that hydrogen bond can physically 
compress it. The consequences of these two effects are 
quite different and should not be confounded. 

In a simple, unconstrained, intermolecular hydro- 
gen bond, such as the one between 4-nitrophenolate ion 
and 4-nitrophenol (5-5 and Figure 5-14) 


5-5 


the distance between donor and acceptor is established 
by two opposing forces. The short-range, unfavorable 
potential energies of repulsion between the electrons of 
the structure and between the three nuclei in the hydro- 
gen bond push donor and acceptor apart. The longer- 
range, favorable potential energy of attraction between 
the monopole and dipole and the overlap energies of any 
covalent bonding pull donor and acceptor together. As 
with any ionic or covalent bond, it is this tradeoff 
between longer-range attraction and shorter-range 
repulsion that defines the minimum in potential energy 
that sets the length of the bond (Figure 5-15).”' These 
potential energies are unrelated to the wells of potential 
energy confining the proton. 

Strong intermolecular hydrogen bonds such as the 
ones in FHF (dpp = 0.226 nm) and H,OHOH,* (doo = 
0.236 nm)” are short hydrogen bonds because the 
monopoles and dipoles are large and the repulsive ener- 
gies become dominant only at shorter distances. Such 
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Figure 5-15: An estimate of the energy of formation of a hydrogen 
bond as a function of the distance between the heteroatom of the 
donor and the heteroatom of the acceptor.” The calculations were 
for the energy of formation of a hydrogen bond between two mol- 
ecules of water in the gas phase (Figure 5-1). The variation of that 
estimate of the energy as a function of the distance between the 
two oxygens is presented. 
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hydrogen bonds are special cases that are irrelevant to 
hydrogen bonds in a molecule of protein at neutral pH. 
Other hydrogen bonds between acids and their conjugate 
bases, however, are more relevant to the acids and bases 
found in a protein (Table 5-4). The hydrogen bond 
between an acid of a type found in proteins and its con- 
jugate base is about 0.02-0.03 nm shorter than the hydro- 
gen bond between the neutral acid and itself or an 
equivalent functional group on a different molecule 
(Table 5-1) because it is a stronger hydrogen bond (Figure 
5-14). It seems reasonable that such strong, short hydro- 
gen bonds—for example, that between the imidazole of 
histidine and the imidazolium of another histidine or 
between the carboxylate ofa glutamate and the carboxylic 
acid of another glutamic acid—should be found in crys- 
tallographic molecular models of proteins, but they are 
rarely observed, probably because the neutralization of 
the cationic acid or the anionic base required to produce 
the acceptor or the donor, respectively, requires more free 
energy at neutral pH than would be gained by forming the 
stronger hydrogen bond. 

It is also possible to shorten a hydrogen bond by 
physically compressing it within a covalent framework. A 
number of intramolecular hydrogen bonds are short 
because of such compression. For example, the hydro- 
gen bond in hydrogen maleate monoanion* 
(0.241 nm)” is shorter than the unconstrained hydro- 
gen bond between two hydrogen fumarate monoanions* 
(0.247 nm),!°7! and that in hydrogen 1,8-diamino- 
N,N,N’N’-tetramethylnaphthalene cation (0.258 nm) is 


Table 5-4: Lengths of Hydrogen Bonds between Acids and 
Their Conjugate Bases“ 


acid or conjugate base bond length (nm) 


p-nitrophenol” 0.246 
8-hydroxyquinoline™ 0.243 
pentachlorophenol” 0.244 
0.248” 
1-(p-hydroxypheny])thianium” 0.247 
cyclohexylamine” 0.280 
1,10-diaminodecane® 0.280° 
N,N,N-tris(2-aminoethyl) amine” 0.280° 
0.285 
N-methylimidazole!” 0.265 
hydrogen succinate’! 0.2444 


“Values presented in this table were gathered during a search of the Cambridge 
Structural Database by Dr Hens Borkent at the Catholic University of Nijmegen. 
Two different hydrogen bonds in the same unit cell. ‘Hydrogen bond between 
two monocations of the diamine or triamine, respectively. “Hydrogen bond 
between two hydrogen succinates. 


* The crystallographic molecular models on which these measure- 
ments are based were gathered from the Cambridge Structural 
Database by Dr Hens Borkent at the Catholic University of 
Nijmegen. 


shorter than the unconstrained hydrogen bond between 
a dimethylalkylamine and its dimethylalkylammonium 
cation* (0.264 nm).'” Both of these short intramolecu- 
lar hydrogen bonds display downfield chemical shifts 
(20.5 and 18.5 ppm) and fractionation factors less than 1 
(0.84 and 0.90). In a comparison between intramolecular 
and intermolecular hydrogen bonds between carboxy- 
late anions and the corresponding carboxylic acid or 
between enols of B-diketones and the corresponding car- 
bonyl oxygen,” the ranges of lengths of intramolecular 
hydrogen bonds (0.239-0.242 and 0.243-0.255 nm, 
respectively) were about 0.006 nm shorter than those of 
the intermolecular hydrogen bonds (0.244-0.249 and 
0.246-0.265 nm, respectively). 

In such intramolecular situations where a hydrogen 
bond is constrained by the framework of the molecule to 
be shorter, this compressed hydrogen bond must have a 
less negative standard enthalpy of formation relative to 
an equivalent, intermolecular, uncompressed one. This 
conclusion follows from the fact that repulsive potential 
energy has to be overcome to compress the bond (Figure 
5-15). The energy necessary to compress the bond is pro- 
vided by the covalent framework of the molecule. It fol- 
lows that an intramolecular hydrogen bond that is 
shorter than an equivalent intermolecular hydrogen 
bond must be weaker than that intermolecular hydrogen 
bond even though it has a lower barrier to proton trans- 
fer (Figure 5-12). 

The measurements available for the free energies of 
formation for such intramolecularly shortened hydro- 
gen bonds confirm this expectation. The hydrogen bond 
in hydrogen maleate anion in dimethyl sulfoxide is only 
-18 kJ mol! more stable than that in neutral maleic acid, 
about what one would expect for the difference in standard 
enthalpy of formation for two hydrogen bonds the accep- 
tors of which differ in pK, by 10 units. In water, the differ- 
ence is only -2 kJ mol’. The hydrogen bond between the 
carboxylic acid and the carboxylate anion in 5-6 


5-6 


shows a chemical shift (18.0 ppm) in its nuclear magnetic 
resonance spectrum that is characteristic of a low-barrier 
hydrogen bond. Yet the standard free energy of its for- 


mation differs from the standard free energy of forma- 
tion for the hydrogen bond in the homologous acid 
amide, in which an NH, replaces the OH, by only 
-10 kJ mol! in benzene (& = 2.3) and -6 kJ mol" in 
dichloromethane Le, = 8.9).'% These differences, if any- 
thing, are less than expected for the differences in free 
energies of formation in these two solvents for two 
hydrogen bonds the donors of which differ by so much in 
pKa. The proton in the hydrogen bond in zwitterionic cis- 
urocanic acid 


5-7 


has a downfield shift (18.5 ppm) in nuclear magnetic 
resonance characteristic of a low-barrier hydrogen bond, 
yet the standard free energy of formation of this hydro- 
gen bond is only 5 kJ mol" less than the standard free 
energy of formation of the hydrogen bond in its conju- 
gate base, cis-urocanic acid monoanion,!” even though 
the difference in pK, between donor and acceptor in the 
latter hydrogen bond is 13. Consequently, the strength of 
hydrogen bond 5-7, in which the values of pK, for the 
donor and the conjugate acid of the acceptor are closely 
matched, cannot be unusually high even though it has 
the signature of a low-barrier hydrogen bond. The chem- 
ical shift and fractionation factor for the proton in the 
hydrogen bond in the enol of 2,4-dioxopentane (5-3) are 
those expected of a low-barrier hydrogen bond, but the 
values of the pK, for its donor and the conjugate acid of 
its acceptor differ so widely (ApK, = 22) that this cannot 
be a strong hydrogen bond. 

The common practice of equating the unusually 
high values for pK, of the donor in an intramolecular 
hydrogen bond to its strength is mistaken. The unusually 
high pK, of such an intramolecular hydrogen bond is 
because of the repulsion between the lone pairs of the 
acceptor and the unprotonated donor that are forced by 
the framework of the molecule to reside immediately 
adjacent to each other in the conjugate base 
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and not because of the strength of the hydrogen bond. 
The interposition of the proton between the two lone 
pairs of electrons that are forced into juxtaposition 
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relieves the repulsion. It comes as no surprise that the 
activation energy for the exchange of the proton in such 
a confined location is much higher than that for a proton 
in an unconstrained, intermolecular hydrogen bond "D 

Whether an intermolecular hydrogen bond is short- 
ened by its strength or an intramolecular hydrogen bond 
is shortened by the compression exerted by the molecu- 
lar framework to which it is covalently attached, the same 
increase in the overlap of the wells of potential energy for 
the proton associated with the heteroatoms of donor and 
acceptor (Figure 5-12) and the same lowering of the bar- 
rier occur. Both strong intermolecular hydrogen bonds 
and compressed intramolecular bonds can be low- 
barrier hydrogen bonds because the height of the barrier 
depends on the length of the bond, not its strength. It fol- 
lows from these considerations that comparisons of the 
physical properties of intermolecular hydrogen bonds 
with those of intramolecular hydrogen bonds are mis- 
leading and should be considered guilty until proven 
innocent. This becomes an even greater offense when- 
ever entropy is involved, as will become more apparent 
in the next section. 

A hydrogen bond is mainly an electrostatic attrac- 
tion between donor and acceptor. The A-H bond of the 
donor is a dipole, electronegative on the heteroatom, A, 
and electropositive on the proton. The o orbital of the 
acceptor, ©B, is electronegative on the lone pair and 
electropositive on the heteroatom, B. These two dipoles 
attract each other electrostatically when oriented in the 
same direction. If the donor is positively charged or the 
acceptor is negatively charged or if both are so charged, 
the electrostatic attraction is increased. 

The hydrogen bond may also have a covalent com- 
ponent and thereby may involve both the overlap of 
atomic orbitals to form molecular orbitals and the delo- 
calization of valence electrons over the three participat- 
ing atoms, A, H, and B. Its covalency would result from 
the mixing of the sp? or sp’ orbital on the heteroatom, A, 
of the donor, the Le orbital of the proton, and the sp? or 
sp’ orbital of the lone pair of electrons of the acceptor, B, 
to form a molecular orbital system with three molecular 
orbitals (Figure 5-16). The covalent component, how- 
ever, is significantly less important than the electrostatic 
attraction of the dipoles. It has been proposed that as 
much as 10% of the hydrogen bonding in ice Ih could be 
covalent,''’!* but this conclusion has been challenged 
by alternative evaluations of the data suggesting that 
there is no covalency.''? The hydrogen bond between 
two molecules of water, however, is one of the weaker 
ones, and stronger, shorter hydrogen bonds may have 
more covalent character.” 

To the extent that a hydrogen bond is the electro- 
static attraction between the dipole of the donor and the 
dipole of the acceptor, the relative permittivity of 
the solvent should affect its strength just as it affects the 
strength of an ion pair. The more effectively these dipoles 
are solvated when they are separated from each other, 


216 Noncovalent Forces 


> 

T 
Q 
Q 


A H B 


Figure 5-16: Molecular orbitals for a covalent hydrogen bond. 
The molecular orbitals are for a symmetric hydrogen bond in 
which the donor and the conjugate acid of the acceptor are equiv- 
alent. The covalent molecular orbital system is formed from two sp” 
or sp? orbitals, one from atom A and one from atom B, and the 
s orbital on hydrogen. These three atomic orbitals combine to form 
the three molecular orbitals—bonding, nonbonding, and anti- 
bonding—shown in the middle of the diagram. The final molecular 
orbital system is constructed formally in steps by first mixing the 
atomic orbitals of atom A and the hydrogen to form the two molec- 
ular orbitals—bonding and antibonding—of the A-H covalent 
bond and then mixing the A-H molecular orbital system with the 
atomic orbital on atom B containing the lone pair of electrons. 


the less advantage will there be in their combination to 
form the hydrogen bond. The higher the relative permit- 
tivity of the solvent, the weaker will be the bond. One way 
to quantify this effect?” is to compare the slopes of corre- 
lations between the free energies of formation of a set of 
hydrogen bonds and either the values for pK, of the 
donor (Figure 5-14) or the values for pK, of the conjugate 
base of the acceptor (Equation 5-37). In water Le, = 78 at 
25 °C), the slope for such a correlation for hydrogen 
bonds between phenolate ion and a set of ammonium 
ions (Figure 5-1 7) is 0.6 kJ mol” (unit of pKa" and that 
for a correlation of hydrogen bonds between ethylene- 
diammonium dication and a series of phenolate ions is 
-0.9 kJ mol (unit of pK,)". The magnitudes of these 
slopes are less than the 3.1 kJ mol” (unit of pK,)” for the 
correlation of hydrogen bonds between 4-nitropheno- 
late or 3,4-dinitrophenolate ion (Figure 5-14) and a 
series of phenols in tetrahydrofuran Le, = 7.5) or the 
-1.3 kJ mol" (unit of pKa)" for the correlation (Equation 
5-37) of hydrogen bonds between fluorophenol and a 
diverse set of bases in CCl, (& = 2.2). 

To this point, most of the hydrogen bonds that have 
been discussed are those formed in aprotic organic sol- 
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Figure 5-17: The apparent association equilibrium constants 
(Kapp,anp) for a series of hydrogen bonds between the phenolate ion 
and a series of aliphatic ammonium ions in aqueous solution as a 
function of the acid dissociation constants Kaay for those ammo- 
nium ions.'“ The associations between the phenolate ion and the 
ammonium ions were followed spectrophotometrically by changes 
in absorbance at 300 nm as ammonium ion was added to an aque- 
ous solution of phenolate ion at 2 M ionic strength and 25 °C. The 
values of the apparent association equilibrium constants were 
divided by the number of protons (p) on the respective ammonium 
ion to convert the molar concentration of the free cation to the 
molar concentration of donors. The acid dissociation constants 
were also statistically corrected by multiplying the observed acid 
dissociation constants by the number of lone pairs on the conju- 
gate base (q) and dividing by the number of protons on the conju- 
gate acid (p), so that the corrected values are for the molar 
concentration of protons on the respective conjugate acid and the 
molar concentration of the lone pairs of electrons on the respective 
conjugate base. The logarithms of the apparent association equi- 
librium constants (in units of molarity”) are linearly correlated 
with the logarithms of the corrected acid dissociation constants (in 
units of molarity”) by a line with Bronsted coefficient of 0.15. As the 
same corrections would be made to both the association constant 
and the acid dissociation constant in converting to units of cor- 
rected volume fraction, the slope of the line would be unaffected. 
The value of log (Kapp,anB p» calculated by Equation 5-51 for an 
acid with a pK, equal to that of water is indicated by a filled square. 
The ammonium ions used were (1) hydroxylammonium ion, 
(2) piperazine dication, (3) sym-tetramethylethylenediammonium 
dication, (4) N,N,N-trimethylethylenediammonium dication, 
(5) ethylenediammonium dication, (6) 2-hydroxy-1,3-diamino- 
propane dication, (7) 1,3-diaminopropane dication, and 
(8) (2-hydroxyethyl)jammonium ion. Adapted with permission 
from ref 114. Copyright 1986 American Chemical Society. 


vents such as carbon tetrachloride or benzene. The situ- 
ation changes dramatically when the donor and acceptor 
are dissolved in water because of the competition of the 
donors and acceptors of the water molecules themselves. 
Pauling and Pressman” noted that the standard free 
energy of formation of a hydrogen bond in water must be 
the difference between its own standard free energy of 
formation and the free energies of formation of the 
hydrogen bonds of its donor and acceptor with water. 


The fact that the concentrations of donors and acceptors 
in water are both 110 Mis a sufficient observation in itself 
to lead to the conclusion that the hydrogen bond between 
a solute A-H and a solute ©B would be unlikely to form. 

The intermolecular hydrogen bond between the 
nitrogen-hydrogen bond of an amide and the lone pair 
of electrons on the acyl oxygen of another amide can be 
used again as an example of the majority of the hydrogen 
bonds in biological macromolecules. When N-methyl- 
acetamide is dissolved in carbon tetrachloride or 
dioxane, an absorption appears in the infrared spectra 
of the two solutions that can be assigned'"° to the stretch- 
ing vibration of the hydrogen-nitrogen bond in a 
hydrogen bond of the structure 
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Hydrogen bond 5-8 is the one supposedly holding 
a helices and ß structure together in a molecule of protein 
and the base pairs together in DNA. From calorimetric 
measurements, it can be calculated that the standard free 
energy of formation of this hydrogen bond" at 25 °C in 
CCl, is -16 kJ mol", when an infinitely dilute solution of 
N-methylacetamide is defined as the standard state and 
the association equilibrium constant is expressed in units 
of corrected volume fraction (Equation 5-13). When 
N-methylacetamide is dissolved in water, however, the 
infrared absorption arising from hydrogen bond 5-8 can 
barely be detected even at aconcentration of 12.5 M. From 
the small absorption that was observed, the standard free 
energy of formation of the hydrogen bond in aqueous solu- 
tion was judged''® to be about +7 kJ mol” at 25 °C, again 
in units of corrected volume fraction. 

A more complete picture of the situation is gained 
by estimating the standard free energy of transfer of an 
amido group from water to CCl, at 25 °C.''®"? The stan- 
dard free energy of transfer of N-methylacetamide from 
water to CCl, at 25 °C is +9 kJ mol" at infinite dilution in 
both phases when units are corrected volume fraction 
(Equation 5-18). If this value is corrected for the 
hydrophobic effect expressed during the transfer of the 
three hydrogen-carbon bonds on the methyl group in 
the acetyl group and the three hydrogen-carbon bonds 
of the N-methyl group from water to CCl,,''*!”° the stan- 
dard free energy of transfer for just the unassociated 
amido group (-CONH-) should be about +25 kJ mol”. 
One amido group contains one donor and an acceptor 
and by itself represents the two participants in the final 
hydrogen bond, so its free energy of transfer from water 
to CCl, should be equal to that for the transfer of one 
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amido nitrogen-hydrogen and one acyl carbon-oxygen 
(indicated in Figure 5-18 by N-H and C=O®). The com- 
plete standard free energy diagram for the hydrogen 
bond (Figure 5-18)''”''? suggests that the standard free 
energy of transfer of a hydrogen-bonded amido nitro- 
gen-hydrogen and acyl carbon-oxygen from water to 
CCl, at 25 °C is around +2 kJ mol”, a value that registers 
the polarity of the hydrogen bond. The most significant 
difference between CCl, and H,O, however, is the high 
stability of the separated donors and acceptors in the 
H,O. The large unfavorable standard free energy of trans- 
fer for the amido group from water to CCl, reflects the 
necessity to break hydrogen bonds between it and the 
water before the transfer can occur. 

If this is the case, the formation of the hydrogen 
bond in water must be written, in analogy to Equation 
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Figure 5-18: Diagram of standard free energy for a hydrogen bond 
between the amido nitrogen (NH) and acyl oxygen (©O=C) of 
N-methylacetamide.'!”'’’ The standard free energies of association 
for the hydrogen bond in water (+7 kJ mol-1) and carbon tetrachlo- 
ride (-16 kJ mol) are positioned in relation to each other on the 
diagram by an estimate of the standard free energy of transfer 
(+25 kJ mol) of an unbonded amido group between the two sol- 
vents. The standard free energy of transfer for the amido group 
alone was calculated from the distribution coefficient of N-methyl- 
acetamide between water and carbon tetrachloride, extrapolated to 
infinite dilution, with the concentrations of solute in each solvent 
expressed in units of corrected volume fraction, and an estimate of 
the standard free energy of transfer of its two methyl groups’'*’”° 
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where the association constant for any of the complexes, 
Kassxy, İS 


(5-41) 


It is entirely possible that the concentrations of 
unbonded donors and unbonded acceptors in aqueous 
solutions are negligible and that only the upper part of 
Equation 5-40 is thermodynamically relevant. 

The equilibrium constant for the upper part of 
Equation 5-40, Kanwsw, is defined by 


[BOHA] [H,0©H,0] 


(5-42) 
[BOH,O][H,OOHA] 


Kyywew = 


where, in the particular case of N-methylacetamide, AH 
is the amido nitrogen-hydrogen bond and BO is a lone 
pair of electrons on the acyl oxygen. The equilibrium 
constant actually observed, Kapp arp, is 


[BOHA] 
Kapp, AHB = 
([BOH,O] + [BO])([H,OOHA] + [HA]) 
(5-43) 
from which it follows that 
1 
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(5-44) 


If, as is reasonable, Kaap > (Kassww)”) Kemp > 
(Kass ww)”, and [H,0@H,0] > 1 M, it follows that 


Kaum 
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Kapp, AHB = 


where [H,O@H,0] is the molar concentration of hydro- 
gen bonds in the water. 

The difference in pK, between the nitrogen-hydro- 
gen bond in N-methylacetamide as a donor (pK, = 16) 
and the oxygen-hydrogen bond in water as a donor 
(pK, = 15.7) should be negligible, as should be the differ- 


ence between the lone pair of electrons on the acyl 
oxygen of N-methylacetamide (pK, = -0.6) and the lone 
pair of electrons on water (pK, = -1.7) as acceptors. 
Therefore, the standard enthalpy of formation for the 
following four hydrogen bonds should be similar 
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and the standard enthalpy change for the upper part of 
Equation 5-40 should be near zero. If the upper part of 
Equation 5-40 were isoentropic as well as isoenthalpic, 
so that Kaywew= 1 (Equation 5-42), then Kapp,ang Would be 
equal to [H,O@H,O}". The observed apparent associa- 
tion constant for the formation of hydrogen bond 5-8 in 
aqueous solution when expressed in units of reciprocal 
molarity''° is about (190 MIT, which is in the range 
expected for the reciprocal of the concentration of 
hydrogen bonds in pure water, [H,0©H,O] < 110 M. The 
conclusion to be drawn from these considerations is that 
a hydrogen bond in aqueous solution will always have a 
small apparent association constant and a large appar- 
ent standard free energy of formation because the con- 
centration of hydrogen bonds between water molecules 
in the solution is a hidden and significant term in that 
apparent association constant (Equation 5-45). 

The hydrogen bond represented by that of 
N-methylacetamide accounts for the majority of those 
found in proteins and nucleic acids, and yet its stan- 
dard free energy of formation is positive by a consider- 
able degree. From this it follows that each hydrogen 
bond of this type in a protein or a nucleic acid is an 
energetic liability rather than an asset. It is possible, 
however, that some other combination of donor and 
acceptor might produce a hydrogen bond strong 
enough to overcome the competition of the water and 
provide a negative standard free energy of formation. In 
assessing this possibility, it would be useful to have an 
equation that could be used to estimate the apparent 


equilibrium constant, Ka vm, for the formation of any 
hydrogen bond in aqueous solution. Such an equation 
has been derived“ and has been demonstrated to be 
reliable.” 

Consider the hydrogen bond 


BIOH-]A 
5-13 


and focus in turn on the portions within the brackets and 
the portions without the brackets. The portion within the 
brackets is the same for all hydrogen bonds. The struc- 
tures without the brackets on either side affect the intrin- 
sic enthalpy of the hydrogen bond by donating or 
withdrawing electrons from this central structure.” The 
net result of their action is an intrinsic enthalpy, Hint ans, 
that is proportional to the product of the two respective 
o constants: 


Hint ans = You % OB (5-46) 


where Voy is a constant of proportionality. These o con- 
stants are the same terms used in physical organic chem- 
istry to provide a quantitative assessment of the ability of 
any group to withdraw or donate electrons and cause 
changes in standard enthalpy in any similar situation. 

For the upper part of Equation 5-40, when the spe- 
cific example of N-methylacetamide is replaced by the 
general reaction between A-H and ©B, the standard 
enthalpy change for the reaction should be 


AH *sawew = Hinang + Hinww ~ Hinvanw ~ Hie me 
(5-47) 


or 
AH’ wm = Ven Lë — Con) (Og — Gul (5-48) 


where Ooy is the o constant for OH taking the place of A 
in 5-13 and oy,9 is the ø constant for H,O taking the place 
of B in 5-13. As the values of the o constants are pro- 
portional to the values of pK, for the appropriate acids 
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2.303 RTt(PKyy - PKanon)(PKans — PKau,0+) 
(5-49) 


where 7 incorporates the constants of proportionality. If 
it is assumed for the moment that the standard entropy 
change for the upper part of Equation 5-40 is negligible 
and that differences in standard enthalpy are the only 
significant determinants of the relative strengths of the 
hydrogen bonds being considered, then 
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-log Kass anwew = T(PKana — PKanon) (PKans — PKaH,o+) 
(5-50) 


and“ 


log Kapp, AHB = 


T(PKana — PK HoH) (PKan,o+ - PKanp) - log [H,O©H,0] 


(5-51) 


where the values of the measured pK, have been cor- 
rected statistically for the number of protons, p, on the 
conjugate acid and the number of lone pairs, q, on the 
conjugate base: 


Re 1 K, (5-52) 
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As the units of concentration for the acid dissociation 
constants are in units of molarity, the units for the asso- 
ciation constant and the concentration of hydrogen 
bonds in the water must also be expressed in units of 
molarity. Equation 5-51 can be used to estimate the asso- 
ciation equilibrium constant in units of molarity", and 
hence the standard free energy of formation (Equations 
5-13 and 5-14), of any hydrogen bond in aqueous solu- 
tion. 

This relationship was validated! by an examina- 
tion of the formation of hydrogen bonds between a series 
of phenolate anions as acceptors and ammonium 
cations as donors at 25 °C 
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where the substituents X and R were various electron- 
donating and electron-withdrawing groups chosen to 
vary the values of pK, for the donor and acceptor. The log- 
arithms of the association equilibrium constants for the 
formation of these hydrogen bonds varied with the pK, of 
either the donor (Figure 5-17) or the acceptor as predicted 
by Equation 5-51. Extrapolating the relationships to 
either pKa ya = PKa yon OF PK, u = Dan gave the same 
value, 2.0, for log [H,O@©H,O]. This numerical value is a 
reasonable estimate for the logarithm of the concentra- 
tion of hydrogen bonds in liquid water, where [H,O@H,0] 
<110 M, and itis in reasonable agreement with the results 
gathered independently with N-methylacetamide. 

The value of t (Equations 5-49 and 5-51) at 25 °C 
and 2 M ionic strength was found to be 0.013, from which 
it follows that even the strongest possible hydrogen 
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bond, where the pK, of the donor equals the pK, of the 
acceptor, would have an association equilibrium con- 
stant in aqueous solution of considerably less than 1 M”. 
For example, the hydrogen bond between the lone pair of 
electrons on imidazole (pu = 6.4) and the 
nitrogen-hydrogen bond of the imidazolium cation 
(pK, na = 6.4) would have an apparent association equi- 
librium constant of only 0.040 M? at 25°C. At pH 6.4, a 
2 M solution of imidazole would have a concentration of 
hydrogen bonds 5-15 equal to only 0.04 M. 
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In water, the association equilibrium constant for FHF, 
thought to be one of the strongest hydrogen bonds, is 
only 4 MI 

Because the large relative permittivity of water 
weakens the overwhelming electrostatic component of 
the hydrogen bond and because the donors and accep- 
tors of the water molecules themselves compete for 
occupation of any other donors and acceptors in solu- 
tion, any comparisons of hydrogen bonds in organic sol- 
vents with hydrogen bonds in water are misleading and 
should be avoided. It is incumbent on the author to state 
clearly the solvent in which the property of each hydro- 
gen bond being discussed was measured. 

The interaction coefficient rin Equation 5-49 is a 
small number because it is the product of the slope of the 
line relating AH "ue to the difference in the values of pK, 
between the donor and water (Figure 5-17) and the slope 
of the line relating AH” §ppg to the difference in the values 
of pK, between the conjugate acid of the acceptor and 
conjugate acid of water. Both of these slopes are small 
because, as a result of its high relative permittivity, water 
solvates the separated donors and acceptors so strongly. 
Because both slopes are small, their product is even 
smaller. Because r is such a small number (0.013), the 
standard enthalpy of formation for any hydrogen bond 
formed in water will be negligible. It follows that a hydro- 
gen bond in aqueous solution cannot be strengthened 
significantly by any alteration of the acid dissociation 
constants of donor and acceptor. 

Consequently, it is the standard entropy of forma- 
tion that determines the standard free energy of forma- 
tion.” It is the entropic effect of the high molar 
concentration of the hydrogen bonds between molecules 
of water ([H,O@H,O] in Equation 5-51) that is usually the 
overriding contributor to this standard entropy of forma- 
tion. Equation 5-51 states that the standard free energy 
of formation of a hydrogen bond in an aqueous solution, 
when standard state has the units of molarity for con- 
centration, will be increased by about +11 kJ mol" 
because of the high molar concentration of the hydrogen 
bonds between molecules of water. There is a way, how- 


ever, to decrease its standard free energy of formation by 
increasing its standard entropy of formation. 
Suggested Reading 
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Problem 5-4: Draw structures that represent all of the 
possible hydrogen bonds that can form between the fol- 
lowing pairs of molecules when they are dissolved in a 
nonpolar solvent. Draw the structures with proper geom- 
etry and include all lone pairs of electrons. 


(A) two molecules of N-methylacetamide 

(B) N-methylacetamide and ethanol 

(C) ethanol and water 

(D) N-methylacetamide and water 

(E) urea and N-methylacetamide 

(F) acetic acid and N-methylacetamide 

(G) 4-methylimidazole and N-methylacetamide 
(H) 4-methylimidazole and ethanol 


(I) acetic acid and ethanol 


Problem 5-5: 


(A) Draw the structure of each of the following hydro- 
gen bonds in the most stable geometry (bond angles 
and distances). Include all lone pairs of electrons. 


HC 
i pa SS 
CH3CNH 2" O=C, N Müu On: 
CH3 
| V 
H3C 
HN — 
‘c= HN 
CR UU H20 N THEE H 2N, 
H3C C=O 
/ 
1 VI Hai 
HC H3C 
— O — NH2 
Il / 
H NAN mm H 2 NCCH 3 NN H mmm Ge 
HI VII GA 
Il 
CH3CNH 2 """OH> H20 "H3O 
IV VIII 


(B) Write the acid dissociations to which the follow- 
ing apparent values of pK, refer, and correct them 
for number of protons and number of lone pairs. 


PKaı PKaz 
O 
ll 0.6 15.70 
CH3CNH> 
H20 -1.75 15.75 
H3C 
- -7.51 15.10 
wae aai 


(C) Rank the eight hydrogen bonds in order of 
strength. 


Problem 5-6: Draw the structure of only the most stable 
hydrogen bond that forms between side chains of the fol- 
lowing pairs of amino acids. Include all relevant angles 
and distances around the various hydrogen bonds. 


(A) glutamic acid and histidine 
(B) serine and tyrosine 


(C) glutamine and histidine 


Problem 5-7: The panel below is an infrared spectrum of 
propionamide in carbon tetrachloride.’** The bands 
marked M; and M, are the absorptions of the monomeric 
propionamide, and the bands marked P,-P; are absorp- 
tions of hydrogen-bonded species. As the concentration 
of propionamide is increased, within each of these two 
sets [(M,, M,) and (P,-P;)] the amplitudes of the individ- 
ual absorptions remain in constant ratio to each other 
and must be different absorptions of the same species. 


Absorption (%) 


3200 3300 3400 3500 
Frequency (cm!) 
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Below is a table of amplitudes from the infrared spectra 
for various total concentrations of propionamide at 
298 Kin CCl. 


[propionamide]ror 


(M) Aan, Ap, 
1.71x10° 0.415 0.055 
2.15x 10° 0.509 0.083 
2.43 x 10° 0.566 0.102 
4.72 x 10° 0.981 0.308 
6.90 x 10° 1.32 0.558 

10.40 x 10° 1.79 1.020 


Recall that, by Beer’s law, [monomer] = (eu) An, and 
[polymeric species] = (ex Ap.. If the hydrogen bonding is 
a dimerization, it should be described by the following 
equation: 


2 propionamide propionamide, 
[propionamide,] 
Sc [propionamide]? 


(A) Show that the data are consistent with a dimer- 
ization. 


(B) Use the data to determine Keq at 298 K in units of 
corrected volume fraction. (Hint: [propi- 
onamidelror 8 [propionamide] + 
2[propionamide,].) 


Values of Keq were determined at a number of different 
temperatures. 


temp (K) Keg (MD 
303 35.5 
313 24.6 
329 13.3 


(C) Convert these three equilibrium constants to 
units of corrected volume fraction. 


(D) Using these three numbers and your value for 
298 K, calculate AH” for the dimerization. 


(E) The appearance of the species P,-P; in the 
infrared spectrum can be adequately explained 
only if propionamide has two hydrogen bonds 
that form a cyclic structure. Draw that structure. 


(Œ) What is AH° for each mole of hydrogen bond? 


Problem 5-8: Poly[d(AT)-d(AT)] melts at 67°C and 
poly[d(GC)-d(GC)] melts at 102 °C. The usual explanation 
for this observation is that there is one more hydrogen 
bond in a G-C pair than in an A-T pair. Consider, how- 
ever, this explanation in terms of Figure 5-18. Assume 
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that when the heterocyclic bases in single strands of ran- 
domly coiled DNA form a double helix, the interior 
o faces of the bases are transferred from water to a com- 
pletely nonpolar environment (Figure 6-48) and that the 
proper tautomers for base pairing are present at the pH 
of the experiment. 


(A) Write complete equations for the formation of an 
A-T pair and a G-C pair as the double helix forms, 
drawing in all hydrogen-bonded waters to all 
donors and acceptors on each side of the equation 
and the hydrogen bonds that form between the 
residual waters. Assume that the donors and 
acceptors of hydrogen bonds accessible to water 
in the major and minor grooves retain their 
hydrogen bonds with HO. 


(B) Why is poly[d(GC)-d(GC)] more stable than 
poly[d(AT)-d(AT)]? 


Problem 5-9: There are two possible hydrogen bonds 
that can form between the neutral form of phenol, a 
model for tyrosine, and the free base of imidazole, a 
model for histidine. 


(A) Draw the full structures of both partners in both of 
the possible hydrogen bonds with proper 
hybridization on the central atoms and proper 
bond angles around the hydrogen bond. 


(B) Which of the two possible hydrogen bonds is the 
more stable? Why? 


(C) Estimate the standard free energy of formation of 
the more stable hydrogen bond at 25 °C when it is 
formed in aqueous solution if the statistically cor- 
rected values of pK, are 9.65 for phenol and 7.35 
for the conjugate acid of imidazole. 


Intramolecular and Intermolecular Processes: 
Molecularity and Approximation 


Intramolecular chemical reactions often occur at rates 
much faster than equivalent intermolecular reactions, 
and intramolecular associations often occur with associ- 
ation equilibrium constants much larger than those of 
equivalent intermolecular associations. A particularly 
informative series illustrating such effects can be gath- 
ered!” from among the reactions involving intramolecu- 
lar nucleophilic catalysis of the hydrolysis of phenyl 
esters by the carboxylate anion. The mechanism for this 
nucleophilic catalysis has been shown to involve the for- 
mation of an intermediate anhydride, which in the 
intramolecular examples such as phenyl succinate would 
be cyclic: 


H H O n H O 
H 0.00 Ke . 
aT NH ; 
Boe Qe e Ss 
5-16 
: H 4 O 
E NS + (O98 
Oo 
© 
(5-53) 


The tetrahedral intermediate 5-16 that leads to the 
anhydride has botha phenolate and a carboxylate as poten- 
tial leaving groups, and because the latter is the better leav- 
ing group, the intermediate should decompose to reactant 
much more frequently than to anhydride. Therefore, the 
reaction involves a preequilibrium between the reactant 
and the tetrahedral intermediate. Occasionally the pheno- 
late is ejected from the tetrahedral intermediate in a kinet- 
ically irreversible step. It is the ejection of the phenolate 
that is monitored as the reaction progresses. The first-order 
rate constant, in units of reciprocal seconds, for the appear- 
ance of phenolate would be equal to Keg, 7;kg, where Keq,ri is 
the equilibrium constant for the formation of the tetrahe- 
dral intermediate. If it is assumed that for all compounds 
in the series the rate constant kg, which in all cases is for a 
chemically equivalent first-order reaction, has the same 
value, then the differences in observed rates result from 
differences in Keq,rp an equilibrium constant. 

A comparison of the first-order rate constants of 
phenolate release for a series of intramolecular reac- 
tions!” to the estimated pseudo-first-order rate constant 
for the intermolecular reaction between phenyl acetate 
and excess acetate anion, when the catalysis by acetate 
anion proceeds through acetic anhydride as an interme- 
diate,'“°'’’ indicates how large these intramolecular 
increases in an association constant can be (Table 5-5). 
The 6 x 10° increase in the association equilibrium con- 
stant for the intramolecular formation of succinic anhy- 
dride,! when compared to the association equilibrium 
constant for the intermolecular formation of acetic anhy- 
dride expressed in units of corrected volume fraction, is 
somewhat greater than the increase seen with phenyl 
succinate (Table 5-5). This fact suggests that the 
increases listed in Table 5-5 are reasonable. 

A situation similar to the intramolecular hydrolysis 
of phenyl esters is encountered in the alkaline hydrolyses 
of endo-6-hydroxybicyclo[2.2.1]heptane-endo-2-carbox- 
amides in which a preequilibrium between reactant and 
tetrahedral intermediate precedes the expulsion of the 
amine (7 
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Table 5-5: Relative Preequilibrium Constants’! for the Formation of a Tetrahedral Intermediate“ 


phenyl ester relative equilibrium constant” TAS approx. (kJ mol") 
O 
HsC < > 1.0 
O + Hac— 
O 
O 
H3C 
H3C O- 100 -11 
O 
O 
O 
O 2x10 -24 
o- 
O 
O 
d O 1x 10° -34 
e: 
O 
O 
O 4x 10° -38 
o- 
O 
O 


“The first-order rate constants for the ejection of phenol from the several phenyl monoesters of dicarboxylic acids (last four entries) were determined’ as a function of pH 
in the range pH 4-8. From the pH-rate behavior of each of these rate constants, the first-order rate constant for intramolecular nucleophilic catalysis of the respective ejec- 
tion by the appended carboxylate could be calculated. From these values, the first-order rate constants for intramolecular anhydride formation could be calculated. These 
rate constants were determined for each phenyl monoester at 25, 30, or 35 °C. The values of these rate constants were adjusted to the same temperature and originally pre- 
sented relative to the first-order rate constant for intramolecular ejection of phenol from phenyl glutarate.!” These first-order rate constants were later related to the 
pseudo-first-order rate constant for the intermolecular formation of acetic anhydride from excess acetate anion and phenyl acetate during the intermolecular nucleophilic 
catalysis of the hydrolysis of phenyl acetate by acetate anion.'° "First-order rate constants for the formation of anhydride were originally presented relative to the calcu- 
lated! pseudo-first-order rate constant for the formation of acetic anhydride from phenyl acetate and acetate anion. The latter intermolecular rate constant was in units 
of molarity"! second’. It has been assumed that all of the rate constants for the formation of the anhydrides are directly proportional to the equilibrium constants for the 
formation of the respective tetrahedral intermediates (Equation 5-53). The resulting units of molarity” for the equilibrium constant for the formation of the tetrahedral 
intermediate from acetate anion and phenyl acetate were converted to units of corrected volume fraction with Equation 5-13. No correction is required for the intramol- 
ecular reactions if it is assumed that they involve only negligible changes in molar volume. All equilibrium constants for the formation of the tetrahedral intermediates are 
presented relative to that for the reaction of acetate anion with phenyl acetate. “Standard entropy of approximation, calculated by Equation 5-60 with the intermolecular 
equilibrium constant in units of corrected volume fraction and with the assumptions that all of the rate constants are directly proportional to the equilibrium constants 
for the formation of the tetravalent intermediate and that AAH?’ = 0. This entropy of approximation was multiplied by 298 K. 


In these reactions the intramolecular increase in associ- 
ation constant Keq,t is 1 x 10° relative to that for the for- 


Boa ke + HNR mation of the equivalent tetrahedral intermediate from 

2 hydroxide ion and a bicyclo[2.2.1]heptane-endo-2-car- 

OR O O boxamide when concentrations are expressed in cor- 
ON nO O rected volume fraction. 

R Hr Many other examples of intramolecular accelera- 


tions of rates or increases in association equilibrium con- 


(5-54) stants have been reported,” but the accelerations 
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presented in Table 5-5 are among the largest that have 
been noted for each particular size of ring formed during 
each of these intramolecular reactions. In some of the 
other instances, electronic effects are difficult to separate 
from the effects of approximation. For example, in the 
intramolecular reaction 


HO V 


+ H,O 


(5-55) 


the two reactants are connected electronically by the 
mz system in addition to being juxtaposed by the o system. 

At one time it was fashionable to refer to these 
accelerations in rate or increases in association as the 
result of an increase in the effective molarity of one of 
the reactants brought about by attaching it covalently to 
the other. As more exaggerated examples of this phe- 
nomenon were reported, however, the unreality of dis- 
cussing concentrations of millions of molar became 
apparent, H and a more reasonable view of the situation 
was required. “88° 

In all instances in which an intramolecular associa- 
tion, for example, the intramolecular formation of a 
hydrogen bond in a folding polypeptide, is compared to 
an equivalent intermolecular association, the difference 
observed in the two equilibrium constants Keg intra and 
Kammer is due in large part to an increase in the change in 
standard entropy caused simply by the fact that a uni- 
molecular reaction is being compared to a bimolecular 
reaction. This increase in the standard entropy change 
for the intramolecular association results from the fact 
that the standard entropy of approximation is missing 
from the standard free energy change for the intramole- 
cular association. The standard entropy of approxima- 
tion, AS°;pprow is the change in standard entropy due 
solely to bringing the separate reactants together into the 
same molecule or into the same complex, respectively, 
prior to the beginning of the reaction. It is a negative 
number because the intrinsic entropy of two separate 
reactants relative to the unimolecular product of a reac- 
tion is greater than the intrinsic entropy of one molecule 
containing the two reactants or of one complex into 
which the two reactants have been assembled relative to 
the product of the intramolecular reaction. For example, 
the intrinsic entropy of a free acetate anion and a free 
phenyl acetate is much larger relative to the tetrahedral 
intermediate formed when they associate than is the 
intrinsic entropy of a phenylsuccinate anion relative to 
the cyclic tetrahedral intermediate in Equation 5-53. 
Because the standard entropy of approximation is miss- 
ing from the standard entropy change for the intramole- 
cular reaction, owing to the fact that approximation has 


already been accomplished synthetically or biologically, 
the standard entropy change for the intramolecular reac- 
tion is more positive than the standard entropy change 
for the intermolecular reaction. 

These relationships can be expressed in equations. 
These equations are not intended to reflect actual 
changes in standard entropy, which are often dominated 
by changes in solvation, but to represent the quantitative 
consequences of approximation underlying the 
enhancements in rate or equilibrium constant. The stan- 
dard entropy change of the intramolecular reaction, 
AS° intra, Should be related to the standard entropy change 
of the intermolecular reaction, AS° inter, by 

AS° = AS? + AS? 


inter intra approx 


(5-56) 


if the same reaction with the same change in standard 
enthalpy is occurring once the reactants have been 
approximated. Ifthis were an adequate description ofthe 
situation and the same change in standard enthalpy did 
occurin each reaction, then 


Keg intra 


K, 


eq, inter 


Rin = AS° - AS 


fe} 
intra inter ~ 7 approx 


(5-57) 


Because AN approx <0, Keq,intra > Keq,inter- 

The magnitude of the standard entropy of approxi- 
mation is determined by the difference between two 
other standard entropy changes, the standard entropy of 
molecularity and the standard entropy of rotational 
restraint.'**'*° The formation of a unimolecular product 
during an intermolecular reaction requires that two or 
more independent molecules become one molecule, and 
this involves a considerable decrease in standard 
entropy. The standard entropy change responsible for 
this decrease, the standard entropy of molecularity, 
AS mole has a negative value and is a major, unavoidable, 
unfavorable term in the change in standard free energy 
in any intermolecular reaction. In an intramolecular 
reaction, however, the decrease in standard entropy due 
to the standard entropy of molecularity does not occur, 
because reactants are already on only one molecule, and 
this has the effect of increasing dramatically the change 
in standard entropy for the intramolecular reaction rela- 
tive to the intermolecular reaction and hence increasing 
its yield of product. There is affiliated with an intramole- 
cular reaction, however, a standard entropy of rotational 
restraint, which, conversely, is irrelevant to an intermol- 
ecular reaction. The standard entropy of rotational 
restraint is the increase in standard entropy that results 
from the fact that the formation of the transition state or 
product during an intramolecular reaction requires that 
a portion of the rotational entropy in the molecule be 
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eliminated because only a fraction of the accessible rota- 
tional isomers can participate in the reaction produc- 
tively. The standard entropy change accompanying this 
decrease in the number of rotational isomers, the stan- 
dard entropy of rotational restraint, AS°,,,, has a negative 
value and its inclusion causes the standard entropy 
change for the reaction to be smaller than it would be if 
no rotational freedom were lost during the reaction 
because only a productive rotational isomer was present. 

The relationships between the standard entropy of 
approximation and the standard entropy of molecularity 
and standard entropy of rotational restraint are 


AS? = AS 


o 
approx molec 


— AS ot (5-58) 


The magnitudes of each of these terms can be discussed 
in turn. 

The standard entropy of molecularity is the 
decrease in standard entropy that should accompany the 
change of an intermolecular reaction to a rigidly oriented 
intramolecular reaction.” In the specific case of a 
bimolecular reaction, the two independent reactants 
have six translational and six rotational degrees of free- 
dom, but the one molecule, formed by the association of 
the two others, should have only three translational and 
three rotational degrees of freedom. The standard 
entropy change associated with the loss of the three 
translational and three rotational degrees of freedom 
during this inescapable association, calculated for the 
situation in which the two reactants and the transition 
state or product are dissolved in a solution at 25 °C with 
a standard state of 1 M in solutes, has been estimated'“® 
to be between -190 and -210 J KT mol’. This estimate 
can be compared to the standard entropy change 
observed for a simple bimolecular reaction, such as the 
dimerization of cyclopentadiene in the liquid phase, 
during which the standard entropy change is -130 to 
-170 J K! mol! with the same choice of standard state. 
The difference between the calculated standard entropy 
change and the observed standard entropy change in the 
particular instance of cyclopentadiene can be com- 
pletely accounted for by the presence of low-frequency 
vibrations in the dimer that could not be present in the 
two monomers because of their smaller size. It has been 
concluded’ that -190 to -210 J K' mol" is an adequate 
estimate for AS° moleo the maximum decrease in standard 
entropy change expected from converting a bimolecular 
reaction into a unimolecular reaction, when molarities 
are used as units of concentration for standard states. If 
corrected volume fractions are used as units of concen- 
tration for standard states and the partial molar volumes 
of the solutes are in the range between 40 and 150 mL 
mol’, the range for expected standard entropy of 
approximation for a bimolecular reaction would be -165 
to -195 J K’ mol” (Equation 5-13). 

As the example of the dimerization of cyclopentadi- 


ene illustrates, an intramolecular reaction, because it 
usually involves a larger and more flexible molecule than 
any of the reactants in an intermolecular reaction, can 
never realize all of this favorable standard entropy of 
molecularity. Major factors in decreasing the portion of 
the standard entropy of molecularity that an intramolec- 
ular reaction will enjoy are the internal rotations within 
the intramolecular reactant itself. These rotations 
decrease the probability that the necessary juxtaposition 
of reactants will occur. For example, in the case of phenyl 
succinate (Equation 5-53) the dihedral angles around 
three carbon-carbon single bonds must be appropriate if 
the carboxyl oxygen is to be placed adjacent to the acyl 
carbon. It has been estimated'**° from the results of 
thermodynamic and kinetic measurements from a 
number of intramolecular reactions that the standard 
entropy of rotational restraint decreases by about 20 J K' 
mol” for every bond that lies between the two atoms par- 
ticipating directly in the reaction and about which free 
rotation can occur. 

When two similar intramolecular associations are 
compared, for which it is assumed that differences 
between their standard enthalpies of formation are neg- 
ligible!*® 


K, 1 
2 (5-59) 
eq2 


AAS° = Rin 


where AAS° is the difference between their standard 
entropy changes and Keqı and Keqz are the respective asso- 
ciation equilibrium constants. The change in relative rate 
of phenoxide release in going from phenyl glutarate to 
phenyl succinate (Table 5-5) is 230-fold and in going from 
phenyl succinate to the phenyl ester of the fused ring is 
also 230-fold. In each comparison, one less 
carbon-carbon bond around which free rotation is 
allowed is found in the more constrained member ofthe 
pair. The changes in rate, presumably reflecting differ- 
ences in Kam are equivalent in each comparison to 45 J 
K! mol". It has been noted, however, that the case of 
nucleophilic catalysis of the hydrolysis of phenyl esters 
provides the largest standard entropy of rotational 
restraint (carbon-carbon bond)” yet observed.' It has 
been proposed that any increase in the apparent entropy 
of approximation in excess of 20 J K that is accomplished 
by freezing the rotation around a carbon-carbon bond 
may be due to a decrease in the strain encountered by 
the reaction over and above the decrease in the rotational 
degrees of freedom.'”° This decrease in strain would be 
effected by an unintended improvement in the orienta- 
tion and alignment of the two reactants in the productive 
conformation produced by the changes in the structure 
of the molecule that were required to freeze the rotation. 

With small molecules, the effect of approximation 
on an equilibrium constant is usually significant only 
when the two central atoms that participate in the asso- 
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ciation become involved in a five-membered or six- 
membered ring in the product. A four-membered ring is 
usually too strained, because of the normal bond angles 
of commonly encountered molecules, to provide any 
favorable approximation. A seven-membered ring, if 
there is free rotation about every bond, has too small a 
value of AS°,o to exhibit a AR "pr small enough to over- 
come the strain of the ring and have a noticeable effect 
on equilibrium. For example, even in the intramolecular 
nucleophilic catalysis of phenyl ester hydrolysis, a series 
of reactions unusually prone to intramolecular catalysis, 
phenyl adipate would show a rate of phenolate release 
due to nucleophilic catalysis only 4-fold greater than that 
for the same reaction of phenyl acetate in 1.0 M sodium 
acetate. Large, rigid molecules in which the two atoms 
that must react are more than six atoms apart yet close 
enough to collide have been synthesized," but pro- 
teins and nucleic acids are the ultimate examples. 

The magnitude of the actual difference in the 
change in standard entropy between a given intramolec- 
ular reaction and the corresponding intermolecular 
reaction will be less than the magnitude of the standard 
entropy of approximation because vibrational degrees of 
freedom, unavailable to the reactants in the intermolec- 
ular reaction, are available to the necessarily larger reac- 
tant in the intramolecular reaction and because steric 
effects that do not apply to the intermolecular reaction 
are often unavoidable consequences of designing the 
intramolecular reactant. For example, the intramolecu- 
lar rates of lactonization for a series of bicyclic yhydroxy- 
carboxylic acids decrease as the strain energies of the 
rigid five-membered rings of the tetrahedral intermedi- 
ate (see Equation 5-54) increase,'*° even though in each 
case the hydroxy group and the acyl carbon are posi- 
tioned rigidly in the same orientation and at about the 
same distance from each other. 

If the magnitude of AAS°, the actual difference 
between the standard entropy changes in the reactions, 
must be less than the magnitude of AA" pre: then 


Keq,intra 


TAS° < -arin| 


approx 


| + AAH° (5-60) 


eq,inter 


where Kegintra and Keg inter are the intramolecular and 
intermolecular association equilibrium constants. If the 
differences in the actual standard enthalpies of forma- 
tion, AAH’, are known, the estimates of the upper limits 
for AS°approx can incorporate them. If they are unknown, 
they can be assumed to be zero, for the sake of argument, 
but such an assumption can be misleading. 

In relating the change observed in an equilibrium 
constant to the entropy of approximation, the difference 
in standard enthalpy change resulting from the chemi- 
cal strategy used to accomplish the approximation, 
AAH’, complicates the interpretation (Equation 5-60). In 
several reactions displaying large increases in the rate of 


the reaction or the yield of the product due to covalent 
approximation of the reactants, the major effect of the 
approximation is on the standard enthalpy change of the 
reaction rather than the standard entropy change.” For 
example,’ 2,2,3,3-tetramethylsuccinanilide at pH 5 dis- 
plays a rate of aniline release 1200 times greater than that 
of succinanilide itself. This increase in rate, however, 
which is equivalent to a change in the standard free 
energy of activation of 18 kJ mol, is accompanied by a 
change in the standard enthalpy of activation, AAH°E, of 
-25 kJ mol". Therefore, in this case the standard entropy 
of activation actually decreases as the rate of the reaction 
is enhanced by approximation. It is difficult, however, to 
interpret such observed changes in the thermodynamic 
parameters of activation because they are usually domi- 
nated by solvent effects that mask the underlying effects 
of approximation on the rates or equilibrium 
constants." In the case of the intramolecular catalysis 
manifested in the alkaline hydrolyses of endo-6-hydroxy- 
bicyclo[2.2.1]heptane-endo-2-carboxamides (Equation 
5-54), it has been concluded that the rate enhancement 
of 1 x 10° (TAS° approx < —34 kJ mol?) “results almost 
entirely from the entropy effect”, probably because there 
is no strain involved in the formation of the additional 
five-membered ring of the tetrahedral intermediate.” 

When the upper limits of TAS°;pprox are calculated 
from the relative rates between the intramolecular and 
bimolecular nucleophilic catalysis of the hydrolysis of 
phenyl esters, on the assumption that AAH” is equal to 
GR they are all equal to or greater than -38 kJ mol" 
(Table 5-5). The upper limit of TAS°;pprox calculated from 
the increase in the equilibrium constant for the forma- 
tion of the tetrahedral intermediates in the hydrolyses of 
endo-6-hydroxybicyclo[2.2.1]heptane-endo-2-carbox- 
amides is-34 kJ mol". If it is assumed that the fused rings 
retain two rotational axes, which is a generous assump- 
tion, AS° approx (Equation 5-58) should be about -140 J K ' 
mol! and TAS “approx about -40 kJ mol! when units of cor- 
rected volume fraction are used. The upper limit of 
TAS ° approx Calculated from the increase in the equilibrium 
constant for the formation of tetrahedral intermediates 
in the hydrolyses of endo-6-hydroxybicyclo[2.2.1]hep- 
tane-endo-2-carboxamides is -34 kJ mol. 

At least three points are illustrated by this exercise. 
First, the largest intramolecular increases in rate or degree 
of association yet measured, with the educational excep- 
tions of the cases involving severe compressive steric 
effects in the transition states or products, are of a magni- 
tude less than that expected simply for the transformation 
of an intermolecular reaction into a fully constrained 
intramolecular reaction. Second, the maximum decrease 
in standard free energy ofassociation to be expected when 
abimolecular association suchas the formation ofahydro- 
gen bond is turned into an intramolecular association is 
about -55 kJ mof), which would produce an increase in 
its association equilibrium constant, when the units are 
corrected volume fraction, of 5 x 10°. Third, the larger the 


Intramolecular and Intermolecular Processes: Molecularity and Approximation 227 


number of bonds about which rotation can occur between 
the two atoms—for example the heteroatom of the donor 
and the acceptor—that must be juxtaposed during a reac- 
tion, the smaller will be the decrease in standard free 
energy to be expected when an intermolecular association 
becomes an intramolecular association. 

With these points in mind, the issue of intramolec- 
ular hydrogen bonds in aqueous solution can be 
addressed. The standard enthalpy change for a hydrogen 
bond forming in water should be quite small, but possi- 
bly of a negative value (Equation 5-49). The competition 
of water molecules for donor and acceptor seems to con- 
tribute an entropic effect the magnitude of which is 
-38 J K! mof), which is R In 100, when concentrations of 
donor and acceptor are expressed in units of molarity. In 
an intramolecular association, the consequent elimina- 
tion of the standard entropy of approximation should be 
able to compensate for the entropic deficit caused by the 
presence of the water. 

Evidence for the existence of intramolecular hydro- 
gen bonds within solutes dissolved in aqueous solution 
has been reported. The extensive lore surrounding 
involvement of hydrogen bonds in the equilibrium 
acid-base behavior of the monoanions of dicarboxylic 
acids is by and large equivocal,” but it has been noted” 
that decreases in the rates of the reactions of the acidic 
hydrogens in the monoanions of salicylates (5-17) with 
hydroxide ion 


5-17 


suggest that they contain intramolecular hydrogen 
bonds the standard free energies of formation of which 
are around -15 kJ mol. A series of compounds capable 
of forming intramolecular hydrogen bonds either 
between the pyrrole nitrogen-hydrogen bond on imida- 
zole as donor (pK, = 15) and a carboxylate as acceptor 
(pK, = 5) or between the pyridinyl lone pair on imidazole 
(pK, = 7.5) as acceptor and the nitrogen-hydrogen bond 
on an ammonium cation as donor (pK, = 10) has been 
described (Table 5-6). As the standard entropy of 
approximation was decreased by confining the juxta- 
posed donor and acceptor more severely, or as the dif- 
ference in pK, was decreased, the equilibrium constant 
for the intramolecular hydrogen bond 


[BOHA] 
[BO + HA] 
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increased in magnitude (Table 5-6). 


Table 5-6: Intramolecular Hydrogen Bonds in Water 


b 


hydrogen bond ApK," Kintra ABB 
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“Difference in pK, between intermolecular equivalents of donor and acceptor. 
bValue for Kintra,ang is for the ratio of the concentration of the hydrogen-bonded 
species (see column 1) to the concentration of the same tautomer not hydrogen- 
bonded. The equilibrium constants Kintaang for the formation of the noted 
intramolecular hydrogen bonds in aqueous solution were estimated from values 
of the N1-N3 tautomeric equilibrium constants (Equation 2-31) for the respective 
4-substituted imidazoles determined by N nuclear magnetic resonance.’ It 
was assumed that a difference between the value of the tautomeric equilibrium 
constant for a 4-substituted imidazole in which a hydrogen bond can form and 
the value of the tautomeric equilibrium constant for a similar compound in which 
a hydrogen bond cannot form is due to the formation of the noted hydrogen 
bond. It was also assumed that the value of the tautomeric equilibrium constant 
for the form of the hydrogen-bonding species in which the bond is not formed is 
equal to that of the reference compound and that the observed excess of one of 
the two tautomers represents entirely the hydrogen-bonded form. “Estimated 
from the difference between the second macroscopic acid dissociation constants 
of cis- and trans-urocanic acid and the N1-N3 tautomeric ratios for neutral cis- 
urocanic acid.” 


The question that these results raise is whether or 
not the hydrogen bond in a conformation such as an 
o helix, a hairpin of p structure, or a p turn can be made 
favorable by sufficient standard entropy of approxima- 
tion. Certain cyclic hexapeptides are rigid enough to 
enforce the conformation of a Bturn in which an 
intramolecular hydrogen bond is formed between the 
acyl oxygen of one of the six amino acids and the amido 
nitrogen-hydrogen of the amino acid three positions to 
the amino-terminal side of it (Figure 4-16D),'” and the 
conformation of the £ turn can be varied by changing the 
sequence of the cyclic peptide. In fact, the first tight turn 
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to be observed was in such a cyclic hexapeptide.*°"*’ An 
æ helix or ß structure, however, cannot be encompassed 
so easily. 

If it is assumed that either an o helix or a hairpin of 
p structure has already been initiated, could the standard 
free energy of formation for the next hydrogen bond 
(Figure 5-19) during the propagation have a negative 
value? In Figure 5-19, the next donor and acceptor in 
each structure are marked with asterisks. Because each 
of these reactions occurs in aqueous solution, the stan- 
dard enthalpy of formation for such a hydrogen bond is 
close to 0 (Equation 5-49). The value for the standard free 
energy of formation for the hydrogen bond will be deter- 
mined in part by the difference between the unfavorable 
competition of the water and the favorable elimination 
of entropy of approximation that the structures provide. 
For the « helix, there are two bonds about which rotation 


A 


O 5; O 
E SZ? Y 
Y Y, H 


Figure 5-19: Intramolecular formation of a hydrogen bond to 
elongate an o helix or a ß hairpin. (A) To add the next hydrogen 
bond in an elongating œ helix, the acyl oxygen marked with the 
asterisk must combine with the amido nitrogen-hydrogen marked 
with the other asterisk. The two bonds about which rotation can 
occur between the last acyl group fixed in the «æ helix and the acyl 
oxygen in question have been highlighted with arrows. (B) To add 
the next hydrogen bond in an elongating antiparallel ß hairpin, the 
acyl oxygen marked with the asterisk must combine with the amido 
nitrogen-hydrogen marked with the other asterisk. 


can occur between donor and acceptor;'” for the hairpin 
of p structure, there are four. It should, however, be 
remembered that when the problem is stated in these 
terms, the difficulties involved in the initiation of either 
of these structures are ignored, and these can be even 
more formidable.” 

Recently, several short peptides (8-16 aa in length) 
have been shown to form antiparallel p structure in 
aqueous solution by folding back on themselves to form 
a hairpin.'**"“° These hairpins of antiparallel £ structure 
have marginal stability at room temperature and the 
pairing across the sheet is as yet unpredictable. Short 
stretches of parallel 6 structure (encompassing three or 
four pairs of amino acids) have been observed in aque- 
ous solution in molecules in which two short peptides 
are coupled at their carboxy-terminal ends to two prop- 
erly spaced positions in a somewhat rigid molecular tem- 
plate that assists in properly orienting hem HIT These 
results suggest that approximation and the cooperative 
formation of the hydrogen bonds between the two 
strands in these parallel and antiparallel H structures can 
overcome the competition of the water for donors and 
acceptors, but only barely. 

All short, linear peptides examined so far fail to 
form o helices when they are dissolved at room temper- 
ature in water at neutral pH, even if they have the same 
amino acid sequence as an o helix in a crystallographic 
molecular model.” This is almost certainly due to the 
difficulty of forming the first few hydrogen bonds 
required to initiate the œ helix rather than the formation 
of the hydrogen bonds that fall in line after it has been 
initiated (as in Figure 5-19). When short peptides are 
attached to rigid structural templates that provide prop- 
erly oriented acceptors for hydrogen bonds and thereby 
promote initiation, those peptides display a considerable 
fraction of o helix even at room temperature.’ 

It was noted that when a cyanogen bromide frag- 
ment containing the first 12 amino acids of ribonuclease, 
KETAAAKFERQHHse (where Hse is homoserine), was 
dissolved in 33 mM Na,SO, between pH 4 and 5, an equi- 
librium existed between the structureless form of the 
peptide and an a-helical form. At 0 °C, 15% of the peptide 
was in the a-helical form at equilibrium.'* From this 
original observation, it was eventually“ discovered 
that when the peptide acetyl-AAQAAAAQAAAAQAAY- 
a-amide is dissolved at 0 °C in 1.0 M NaCl, about 50% of 
it is a-helical and 50% is structureless at equilibrium.'”’ 
This peptide, however, even with as peculiar a sequence 
as it has, exhibits significant a-helical content only at low 
temperature. No other simple peptide examined so far 
displays a significantly higher amount of o helix at equi- 
librium.'“*!**"" Considerably higher o-helical content, 
however, is displayed even at room temperature by con- 
tinuous segments of polyalanine (4-19 alanines in 
length) when they are appropriately isolated from the 
charged amino acids at the two ends of these hybrid mol- 
ecules that are required to dissolve the polyalanine in 
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water IT Polyalanine, however, is considerably different 
from the variable and almost unbiased amino acid 
sequences of the «helices in proteins. All of the peptides 
displaying «-helical structure, because of the marginal 
stability of those œ helices and the peculiar sequences 
required, reemphasize the difficulty of overcoming the 
competition of the molecules of water for donors and 
acceptors. 

The existence of marginally stable synthetic 
æ helices, albeit at 0°C, has provided a biochemically 
relevant framework on which to examine the advantages 
provided by such a structure and its resulting standard 
entropy of approximation to the formation of intramole- 
cular hydrogen bonds. Because the side chains protrude 
from an o helix at intervals of 99°,'°'’ the side chain 
four amino acids to the amino terminus of any position 
in an chelix lies almost directly (396°) below the side 
chain at that position (Figure 4-17). A hydrogen bond 
will form between a donor and an acceptor placed 
synthetically at the i and i+ 4 positions of an a-helical 
peptide.'“ For example, in the peptide acetyl- 
AAQAAEAQAKAAQAAY-o-amide, a hydrogen bond can 
form between the glutamate and the lysine when the two 
are held rigidly above and below each other in the a-hel- 
ical conformation of the peptide. The free energy of for- 
mation of this hydrogen bond can be assessed by 
measuring the differences between the equilibrium con- 
stant at 0 °C for the formation of the œ helix of this pep- 
tide and those for the whelices of various controls in 
which the hydrogen bond is unable to form. 

A series of such measurements have been made 
(Table 5-7). None of these free energies of formation is 
remarkable, again presumably because the standard 
entropy of approximation in such a situation barely over- 
comes the competition for donors and acceptors from 
the water. An important point that should be reiterated is 
that these small negative free energies of formation do 


Table 5-7: Standard Free Energies of Formation of 
Hydrogen Bonds within an o Helix’ 


donor/acceptor standard free 
energy of 
formation (kJ mol”) 
lysinium cation/glutamic acid!?"!* -1.1 
lysinium cation ion/glutamate anion’“”"° -1.4 
lysinium cation/aspartic acid!” -0.9 
lysinium cation/aspartate anion’ -1.1 
histidinium cation/glutamic acid!” -0.6 
histidinium cation/glutamate anion!” -1.1 
histidinium cation/aspartic acid!” -2.4 
histidinium cation/aspartate anion™ -3.1 
glutamine/aspartic acid!” -1.8 
glutamine/aspartate anion!’° 4.1 


“Standard free energy of formation for a hydrogen bond between the i and i+ 4 
positions of an a-helical peptide. 


not state that the hydrogen bond is a net contributor to 
the stability of the a helix; in fact, each of the o helices 
that contains hydrogen bonds is less stable than an 
o helix in which the donor and acceptor are replaced by 
alanine.'”’ Rather, it is the formation of the o helix itself 
that provides the standard entropy of approximation 
necessary to make the standard free energy of formation 
of each of these intramolecular hydrogen bonds less 
than 0. 

The difference in standard free energy of formation 
between the hydrogen bond of a lysinium cation and a 
glutamate anion and the hydrogen bond of a Iysinium 
cation and a glutamic acid is -0.3 kJ mol"; that between 
the hydrogen bond of a lysinium cation and an aspartate 
anion and the hydrogen bond ofa lysinium cation and an 
aspartic acid is -0.2 kJ mol’; that between the hydrogen 
bond of a histidinium cation and a glutamate anion and 
the hydrogen bond of a histidinium cation and a glu- 
tamic acid is -0.5 kJ mol”; and that between the hydro- 
gen bond of a histidinium cation and an aspartate anion 
and the hydrogen bond of a histidinium cation and an 
aspartic acid is -0.7 kJ mol’ (Table 5-7). Because in each 
case the carboxylate is more basic than the carboxylic 
acid, and hence a better acceptor, Equation 5-51 predicts 
that these differences should be -3.5, -3.5, -6.5 and 
-6.5 kJ mol”, respectively. The fact that the actual differ- 
ences are significantly less negative than the expected 
differences is further evidence for the fact that an ion 
pair is unstable in aqueous solution relative to the 
separated ions because the conversion of the monoca- 
tionic hydrogen bond into an ion pair actually destabi- 
lizes. This conclusion is reinforced by the fact that the 
difference in standard free energy of formation between 
the hydrogen bond of a glutamine and aspartic acid and 
the hydrogen bond of a glutamine and aspartate anion, 
neither of which is an ion pair, is -2.3 kJ mol", even 
though glutamine is a much weaker acid than either 
lysinium cation or histidinium cation (Table 2-2). 

One could imagine that the standard base pairs 
between adenine and uracil or between guanine and 
cytosine might form in water because the formation of 
the second hydrogen bond or the second and third hydro- 
gen bonds in the respective complexes would be aided by 
standard entropy of approximation gained by the forma- 
tion of the first. Such a cooperative enhancement of 
hydrogen bond strength has been observed in complexes 
between glutaric acid and a cyclic tetraresorcinol in 
CHCl. In these complexes the two carboxylic groups of 
the diacid form hydrogen bonds with the phenolic 
hydroxyls of resorcinols on opposite sides of the ring.” 
The advantage, however, to the formation of the second 
hydrogen bond, a cooperative but intermolecular situa- 
tion, was only -10 kJ mof, and the hydrogen-bonded 
complex could be observed only in aprotic solvents such 
as CHC]. In a similar fashion, standard and nonstandard 
base pairs between two nucleic acid bases will form in 
organic solvents or within micelles, but they do not form 
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in water. 7" In fact, the normal bases in DNA itself can be 
replaced by isosteres that cannot form any hydrogen 
bonds, and the complementary base pairs form as effi- 
ciently from steric complementarity as from base pair- 
ing. 158161 

The hydrogen bonds in DNA contribute to the 
specificity of the pairing of its bases, but in a negative 
sense. For example, in order to compensate partially for 
the strongly unfavorable act of removing the two donors 
and the one acceptor of guanine and the two acceptors 
and one donor of cytosine from water, the three hydro- 
gen bonds of the base pair must be formed in a double 
helix. When only the thymine in the base pair with ade- 
nine is replaced by a 2,4-difluoro-5-methylphenyl group 
so that the two hydrogen bonds cannot form, the base 
pair is less stable” by about 7 kJ mol”. 

There are, however, situations in which hydrogen 
bonds to nucleic acid bases can be sufficiently assisted 
by standard entropy of approximation so as to be mar- 
ginally stable in aqueous solution. The complex between 
9-ethyladenine and a specifically designed host 
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has an association constant of 28 M” at 27 °C in water at 
pH 6 and an ionic strength of 0.05 M, but it was estimated 
that each hydrogen bond in the complex contributed 
only -0.8 kJ mol” to its stability even though the stan- 
dard entropy of approximation must be significant.'® It 
is also possible to form in aqueous solution a hydrogen- 
bonded dimer of the deoxydinucleotide 5’-phospho- 
deoxyguanylyl(3’-5’) deoxycytidine (pdG-dC). A 
hydrogen-bonded dimer of this self-complementary di- 
nucleotide forms with an association equilibrium con- 
stant of 8 M” at 2 °C and pH 7.5.’ If it has the common 
base pairing, six hydrogen bonds hold the two halves 
together. The rings of the bases are rigid, and once two of 
the hydrogen bonds in each base pair form, the standard 
free energy of formation of the third must incorporate 
the undiluted standard entropy of molecularity. 
Furthermore, the nucleotide bases on the two monomers 
are probably already stacked one on top of the other,'® 
an association that at least brings all six donors and 


acceptors to the same side of the monomer if not in 
proper alignment. This stacking of the bases results from 
the hydrophobic effect. It must contribute significantly 
to this association by roughly aligning the bases before 
the dimerization occurs. All of these observations 
demonstrate that the hydrogen bonds between the base 
pairs of a double-stranded nucleic acid do not contribute 
significantly to its stability. 


Suggested Reading 


Page, M.I., & Jencks, W. (1971) Entropic contributions to rate accel- 
erations in enzymic and intramolecular reactions and the 
chelated effect, Proc. Natl. Acad. Sci. U.S.A. 68, 1678-1683. 


Problem 5-10: 


(A) Draw the structure of a hydrogen bond between 
the propionate anion and a proton on N3 of neu- 
tral 4-methylimidazole (see Equation 2-31 for 
numbering). Include all lone pairs of electrons in 
your drawing. 


(B) Use Equation 5-51 anda value of 0.013 for tto esti- 
mate the apparent equilibrium constant in units 
of molarity” for the formation in water of a hydro- 
gen bond between one of the lone pairs on a pro- 
pionate anion (pK, = 4.88) and a proton on N3 of 
4-methylimidazole (microscopic pK, = 15.1). 


(C) The equilibrium constant for the formation of the 
intramolecular hydrogen bond in the 3-(3H-imid- 
azol-4-yl)propionate anion in which carbons 2 
and 3 have been locked by a cyclopentene is 1.5 
(Table 5-6). Using the estimated value of the 
apparent equilibrium constant for the intermole- 
cular formation of the same type of hydrogen 
bond, from section B, calculate an upper limit for 
the value of AS" approx for the conversion of the 
intermolecular reaction into the intramolecular 
reaction. Convert the apparent equilibrium con- 
stant for the intermolecular formation of the 
hydrogen bond into units of corrected volume 
fraction (Equation 5-13) before performing this 
calculation. Assume that AAH’ is 0. 


Problem 5-11: Why might it be that the peptide with the 
amino acid sequence SEEEEKKKKEEEEKKKKF displays 
35% a helix at pH 8.3 and 4 °C?! 


The Hydrophobic Effect 


The hydrophobic effect is exemplified by the fact “that oil 
and water are hostile”'° and do not mix. The reason is 
that water is more stable when the oil is not dissolved in 
it than when the oil is. This failure of oil and water to mix 
is only the most extreme manifestation of the tendency 
of liquid water to expel solutes that are not ions and that 


do not have significant numbers of donors and acceptors 
of hydrogen bonds. An ionic solute is held in water by 
large, negative standard enthalpies of hydration. Solutes 
that have donors and acceptors of hydrogen bonds are 
held in water by the hydrogen bonds they form with it. 
Solutes that neither are ions nor have donors and accep- 
tors of hydrogen bonds are expelled from liquid water. 
This expulsion is the hydrophobic effect. 

The nature of the hydrophobic effect has been suc- 
cinctly described by G.S. Hartley:!® 


The antipathy of the paraffin chain for water is, however, 
frequently misunderstood. There is no question of actual 
repulsion between individual water molecules and paraffin 
chains, nor is there any very strong attraction of paraffin 
chains for one another. There is, however, a very strong 
attraction of water molecules for one another in comparison 
with which the paraffin-paraffin or paraffin-water attrac- 
tions are very slight. 


Aside from the overuse of “very”, it is clear from this 
description that the term hydrophobic is misleading, if 
its etymology is examined closely.’ The oil does not dis- 
like the water. In fact, measurements of interfacial ener- 
gies suggest that the oil prefers the water to itself.'”° 
Rather, water ejects the oil because water molecules 
have a greater like for other water molecules. 

The hydrophobic effect upon a hydrophobic 
solute A can be represented formally by the transfer of 


the solute from water to another solvent j." 


A(H30) == A(solvent j) (5-62) 


It has been proposed’ that the hydrophobic effect can 
also be represented by the formation of a macroscopic 
interface between an immiscible phase of hydrocarbon 
and water. There are, however, significant differences 
between the physical properties of such a macroscopic 
interface and those of the microscopic layer of hydration 
surrounding an isolated molecule of hydrocarbon dis- 
solved in water.'” Because, upon the folding of a mole- 
cule of protein, individual side chains of the amino acids 
are transferred from the water to the interior of the folded 
structure, the transfer of a molecule of solute from water 
to another phase (Equation 5-62) seems to be the more 
appropriate process to examine for the present purposes. 

In the case of the failure of oil and water to mix, 
solute A in Equation 5-62 is a molecule of oil and solvent j 
is the liquid oil itself. Because, in general, pure phases of 
the different solutes A in a comparison may in themselves 
have unique peculiarities, the transfer of a solute from 
water to a nonpolar solvent should be studied in a sys- 
tematic fashion by choosing a common solvent for all of 
the transfers.'”*'” The hydrophobic effect can be quanti- 
fied**'”® by measuring a standard free energy for this trans- 
fer (Equation 5-18). The standard free energy of transfer 
of solute A between water and solvent j, AG°, 4,0_; is the 
change in standard free energy that results only from the 
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change in solvation of solute A between the other solvent 
and the water experienced when solute A, dissolved in 
water at standard state, is transferred from the water into 
that other solvent at standard state. 

As expected from everyday experience, the stan- 
dard free energies of transfer of hydrophobic solutes 
between water and an organic solvent such as benzene, 
carbon tetrachloride, or the liquid solute itself are nega- 
tive (Table 5-8). It is this negative change in standard free 
energy that produces the hydrophobic effect. The 
hydrophobic effect is the only noncovalent force in aque- 
ous solution that proceeds with a net negative change in 
standard free energy, and it is thought to provide all of 
the driving force for the folding of polypeptides, the asso- 
ciation of ligands with a protein, and the formation of 
interfaces between subunits in oligomeric proteins. 

The explicit reason for the negative standard free 
energies of transfer at physiological temperatures is that 
the standard entropies of transfer are larger than the 
standard enthalpies of transfer (Table 5-8). At 25 °C, the 
standard enthalpy of transfer, which in most cases is pos- 
itive and thus unfavorable, is overcome by a much larger 
positive and thus favorable standard entropy of transfer. 
This peculiarity has led to the maxim that the hydropho- 
bic effect is entropy-driven, but this is a misleading view. 
Edsall and Scatchard noted that the incremental stan- 
dard entropies of solution for -CH;- groups in water had 
anomalously large negative values, but they also pointed 
out that because of the large changes in standard molal 
heat capacity, these incremental standard entropies of 
solution would become less and less significant as the 
temperature was raised (Figure 5-20).'*' As the tempera- 
ture increases, the standard enthalpy of transfer becomes 
more and more exothermic. At high enough tempera- 
tures the standard entropy of transfer passes through 
zero and becomes endergonic. As a result, at intermedi- 
ate temperatures the reaction changes from an entropi- 
cally driven process to an enthalpically driven process, 
and at high temperatures the standard entropy of trans- 
fer is actually unfavorable even though the transfer itself 
remains favorable because the standard free energy of 
transfer does not vary significantly with temperature. 

This behavior illustrates the fact, first noted by Edsall, 
that the most characteristic feature of the hydrophobic 
effect is not its change in standard entropy but its change 
in standard heat capacity.'” The anomalously large, pos- 
itive incremental change in standard molal heat capacity 
of solution for solutes in water remains the most reliable 
signature of the hydrophobic effect.'®! 

The changes in the standard thermodynamic state 
functions, such as standard entropy, standard enthalpy, 
and standard heat capacity, that are associated with the 
hydrophobic effect (Table 5-8) have been assigned to 
changes in the thermodynamic properties of the water 
surrounding the solute as it leaves the aqueous phase and 
changes in the thermodynamic properties of the nonpo- 
lar solvent as it enters; in other words, to differences in 
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Table 5-8: Thermodynamic Properties of Transfer from Water to Another Solvent’ 


solute 


methane”! 


propane”!7!178 
butane”171,178 
benzene”!77.178 
toluene”! 
ethanol’!8178 
1-propanol”!7819 
2-propanol”!”8!”9 
1-butanol’181”9 
1-pentanol!””'”9 


solvent i 


benzene 
CCl, 
benzene 
CCI, 
ethane 
propane 
butane 
benzene 
toluene 
ethanol 
l-propanol 
2-propanol 
1-butanol 
1-pentanol 


AG’y,0-i 
(kJ mol) 


-11.0 
-12.1 
-17.6 
-17.1 
-18.2 
-24.5 
-29.4 
-24.2 
-29.0 

=5.3 
-10.3 

-8.7 
-15.3 
-20.7 


AR omi 
(kJ mol") 


+11.7 
+10.5 
+9.2 
+7.1 
+10.4 
+7.5 
+4.2 
2.4 
-2.7 
+10.1 
+10.1 
+13.0 
+9.3 
+7.8 


TAS 1,03 
(kJ mor) 


+22.7 
+22.6 
+26.8 
+24.2 
+28.6 
+32.0 
+33.6 
+21.8 
+26.3 
+15.4 
+20.4 
+21.7 
+24.6 
+28.5 


AC „n,0-i 
JK? mol 


-250 


-280 
-290 
-290 
—430 
—450 
-150 
-220 
-220 
-280 
-350 


“Values were calculated from the thermodynamic behavior of the partition coefficients (Equation 5-17). Original published values for the parti- 
tion coefficients were in units of mole fraction at infinite dilution. These units were converted to units of molarity by dividing them by the appro- 
priate molar volumes of the respective solvents. The partition coefficients in units of molarity were then converted to free energies of transfer and 


entropies of transfer for units of corrected volume fraction (Equation 5-18). All values are for a temperature of 25 °C. 


Standard state functions of transfer (kJ mol!) 


Benzene 


Temperature (K) 


300 


Pentane 


Temperature (K) 


Figure 5-20: Dependence on temperature (Kelvin) of the changes in standard free energy (AG°), standard enthalpy (AH°), and standard 
entropy multiplied by temperature (TAS°) for the transfer of benzene between water and benzene (left panel) and the transfer of pentane 
between water and pentane (right panel).'®' The activities are expressed in units of mole fraction rather than as corrected volume fraction as 
in Table 5-8; and the three state functions, free energy, enthalpy, and entropy, are expressed in units of kilojoules mole". The lines were cal- 
culated from the values of the thermodynamic state functions measured in the range from 0 to 40 °C and either the assumption that the 
observed behavior of AC®, can be fit by an analytic function of temperature and that analytic function can be extrapolated beyond the range 
of measurement (solid lines) or the assumption that AC®, is independent of temperature and has a value that is the mean of the observed 
values (dashed lines). The values of AC, over the range of measurements seem to decrease somewhat with increasing temperature and this 
deviation is the basis for considering adjusted values of AC, for changes in temperature. The trends are the same regardless of the assump- 
tion. Reprinted with permission from ref 181. Copyright 1988 Academic Press. 


the solvation of the solute by the two solvents. The reason 
that these changes of the thermodynamic state functions 
are assigned to the respective solvents rather than the 
solutes is that the solutes in most cases are too small, as 
in the case of methane, ethane, or propane, or too rigid, 
as in the case of benzene, to account internally for the sig- 
nificant changes that are observed. 

There are several observations which suggest that a 
more rigid hydrogen-bonded lattice, similar to the lat- 
tice in ice Ih, surrounds hydrophobic solutes when they 
are dissolved in water. TT! Macroscopic solids known as 
clathrates form spontaneously when hydrocarbons are 
mixed with pure water at proper molar ratios. Clathrates 
are solid, crystalline hydrates that are composed of iso- 
lated individual molecules of hydrocarbon encased in 
rigid hydrogen-bonded networks of water molecules. The 
thermodynamic parameters associated with these solids 
are of amagnitude sufficient to lead to the conclusion that 
similar rigid networks of water molecules should also sur- 
round hydrophobic solutes when they are present at 
dilute concentration.'® Whether or not clathrates are of 
relevance to the hydrophobic effect, their very existence 
has subconsciously influenced our views of the process. 
The unusually large changes in standard heat capacity 
(Table 5-8)'* associated with the hydrophobic effect are 
believed to result from an increase in the order of the 
water surrounding the solute.'® It should be recalled that 
it is the gradual melting of the hydrogen-bonded lattice 
that is supposed to be responsible for the anomalously 
high heat capacity of liquid water itself, and increasing 
the amount or the degree of structure of this lattice should 
produce even greater capacity for melting and, hence, 
greater heat capacities. The partial molar volume of a 
hydrophobic solute in water is usually about 13 cm? mol 
less than that of the same solute in other solvents,” and 
this difference has been thought to result from the effi- 
cient packing of the solute within a cage of hydrogen- 
bonded waters that resembles the networks in clathrates. 
It has already been noted, however, that ice Ih, and pre- 
sumably liquid water, contain a large amount of vacant 
space that a solute could occupy (Figure 5-2), and this 
occupation by itself could explain the smaller values for 
partial molar volume. The increase in the dielectric relax- 
ation time!” and the anomalously large increase in the 
viscosity” observed when a hydrophobic solute is added 
to water and the expansion of the structure of water 
around hydrophobic solutes detected by neutron scat- 
tering,’ however, also indicate that a more rigid and 
structured shell of water forms around the solute than the 
water in the bulk solvent. 

If it is the case, however, that the water surrounding 
a hydrophobic solute is more structured and held within 
a more rigid hydrogen-bonded lattice than water in the 
bulk phase, there should be compensatory thermody- 
namic changes” associated with this increase in 
structure. Specifically, the standard enthalpy of the 
aqueous solution of hydrocarbon should be less than the 
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standard enthalpy of liquid water because the hydrogen 
bonds in this more structured cage should be stronger. If 
a noncovalent chemical transformation occurs in any 
solution, the change in standard entropy observed is 
usually compensated by a change in standard enthalpy. 
This observation can be stated mathematically’ as 


AH” (æ) = AH’ + T,AS°(a) (5-63) 


where o refers to any noncovalent process and AH’ and 
T, are parameters peculiar to that process. Many nonco- 
valent processes occurring in water!” satisfy this rela- 
tionship with T, = 280 + 10 K.* 

In the particular case of the hydrophobic effect, it 
has been noted” that the decrease in standard entropy 
associated with the formation of a rigid cage of hydration 
as the solute enters water is not accompanied (Table 5-8) 
by the decrease in standard enthalpy (Equation 5-63) to 
be expected from such compensation.” The argument is 
that the missing enthalpy in this reaction is the enthalpy 
that was required to crack open the lattice of the liquid 
water to form a cavity for the hydrophobic solute. Because 
standard entropy and standard enthalpy should com- 
pensate almost completely in the formation of the more 
rigid shell of hydration and have little effect on the over- 
all reaction because of this cancellation, it is actually this 
positive enthalpy of opening the lattice to form a cavity 
or, conversely, the negative enthalpy realized upon col- 
lapsing the cavity that produces the hydrophobic effect. 

The positive enthalpy required to open the lattice 
could result from the fact that some of the hydrogen 
bonds of the fluid lattice within liquid water must be 
broken irretrievably when a cavity is formed for a 
hydrophobic solute. The empty donors and acceptors of 
such broken hydrogen bonds can be observed in molec- 
ular dynamics simulations of aqueous solutions of 
hydrophobic solutes.” Water, however, is presumably 
adept at rearranging around small nonpolar solutes to 
form cages like those formed in the clathrates and in the 
process retaining as many hydrogen bonds as there 
would be in the absence of those solutes, and other sim- 
ulations indicate that it is only when all of the dimen- 
sions of a nonpolar solute are more than twice the 
diameter of a molecule of water that significant numbers 
of hydrogen bonds are lost upon the formation of the 
cavity.'®° Nevertheless, in this view, the magnitude of the 
expulsion of hydrophobic solutes from aqueous solution 
should depend on the number of hydrogen bonds that 
must be broken to form the cavity. 

If this picture of the driving force propelling the 


*This value for T, means that values of changes in standard 
enthalpy or changes in standard entropy for most processes occur- 
ring entirely in aqueous solution are monotonously uninformative 
because they register mainly compensatory changes in the struc- 
ture of the solvent. 


234 Noncovalent Forces 


hydrophobic effect were correct, then it would follow 
that only the size of the cavity should determine the 
magnitude of the hydrophobic effect. It is the case that 
the larger the solute, the greater the hydrophobic effect 
(Table 5-8, Figure 5-21), 75% but there is a peculiar 
aspect to this relationship. 

The only feature of a molecule that determines the 
magnitude of the hydrophobic effect exerted upon it is 
the number of hydrogen-carbon bonds that it con- 
tains." This conclusion follows from the following facts. 

First, the change in standard heat capacity upon 
dissolving a molecule in water, which is the fundamental 
thermodynamic manifestation of the hydrophobic 
effect,” correlates with high precision (r > 0.985) to the 
number of hydrogen-carbon bonds that it contains in 15 
different sets containing among themselves a total of 120 
molecules.'® Furthermore, the slopes of each of the 15 
correlations are all the same [30 + 2J K (mol hydro- 
gen-carbon bond) “J. 

Second, it has long been noted**’”*""' that the stan- 
dard free energies of transfer for linear acyclic alkanes 
(the lines connecting the symbols © in Figure 5-21) from 
water to any solvent (hexadecane in the case of Figure 
5-21) correlate with the number of hydrogen-carbon 
bonds they contain. The high precision of this correla- 
tion (r > 0.9999 for the data calculated with units of cor- 
rected volume fraction, the lower line) is one ofthe most 
remarkable facts concerning the hydrophobic effect.’ 
This high precision seems to belie explanations of the 
hydrophobic effect based on the dimensions of the cavity 
formed by the solute because these dimensions should 
depend on the particular length and conformational flex- 
ibility of the particular alkane. 

Third, the standard free energies of transfer from 
water to hexadecane for branched alkanes, cyclic al- 
kanes, alkenes, alkadienes, cyclic alkenes, cyclic alkadi- 
enes, cyclic alkatrienes, arenes, and alkynes'”” all fall 
close to the line governing the behavior of linear alkanes 
(lower solid line in Figure 5-21) when they are plotted as 
a function only of the number of hydrogen-carbon 
bonds that the molecules contain. The data fall even 
closer to the line for linear alkanes when molarities are 
used as units (upper lines in Figure 5-21).* The same 


* The scatter in the data for the free energies of transfer plotted in 
Figure 5-21 when they are calculated from activities with units of 
corrected volume fraction (lower line) is more pronounced than 
when they are calculated from activities with units of molarity 
(upper line). This may be due to the fact that, in the former case, 
the partial molar volumes of the solutes have a large effect on the 
final values for free energies of transfer, yet these partial molar vol- 
umes are from estimates rather than direct measurements. It does 
seem, however, that the standard free energies of transfer for most 
of the other classes deviate systematically from the line correlating 
the standard free energies of transfer based on corrected volume 
fraction for linear alkanes. If this is a real effect, not one due only to 
the fact that the partial molar volumes used are inaccurate, then a 
phenyl ring is worth about 1.4 hydrogen-carbon bonds during 
transfer from water to hexadecane. 
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Figure 5-21: Standard free energy of transfer from water to hexa- 
decane as a function of the number of hydrogen-carbon bonds in 
a hydrocarbon. It was found that when the values for the partition 
coefficients for transfer from water to hexadecane for the linear 
alkanes listed by Abraham et al.!® were treated as the quotients of 
the molarities of the hydrocarbons in the two phases and inserted 
into Equation 5-18, standard free energies of transfer resulted that 
were almost identical to those for the transfer of the same hydro- 
carbons from water to their own liquid calculated with Equation 
5-18 from the solubilities tabulated by McAuliffe.” Consequently, 
it was assumed that the units on the dimensionless partition coef- 
ficients listed by Abraham et al. were molarity molarity. These 
partition coefficients were either inserted into Equation 5-18 
directly to obtain standard free energies of transfer for activities in 
units of corrected volume fraction (lower set of data) or inserted 
into Equation 5-18 modified so that it did not include the expo- 
nential term within the argument of the natural logarithm to obtain 
standard free energies of transfer for activities in units of molarity 
(upper set of data). The partial molar volumes for hydrocarbons in 
hexadecane were calculated with Equation 5-9, and those in water 
with the algorithms of Taube D) The sets of hydrocarbons 
included (in descending order on the graph for each number of 
hydrogen-carbon bonds) branched acyclic alkanes (©; n = 18), 
linear acyclic alkanes (O; n = 10), cyclic alkanes (x; n = 6), acyclic 
monoenes (+; n = 13), acyclic dienes (A; n = 3), primary alkynes 
(V; n=6), cyclic monoenes (Q; n = 2), alkyl arenes containing only 
one phenyl ring (0; n = 17), and alkenyl arenes containing only one 
phenyl ring (x; n = 2). Consequently there are 78 independent data 
points in each set. The acidic hydrogens on the primary alkynes 
were not counted as hydrogen-carbon bonds. The solid lines 
drawn are fit to only the respective data for the linear acyclic alka- 
nes (O) in each set. The dotted line and the line of short dashes were 
fit to the respective points for nine representative acyclic alkanes, 
nine representative acyclic monoenes, the nine acyclic dienes and 
alkynes, nine representative alkyl arenes, and the two alkenyl 
arenes in each set of data, the upper set calculated with units of 
molarity and the lower set calculated with units of corrected 
volume fraction. Each of these lines was forced to pass through the 
origin. All of the representatives chosen contained between 4 and 
16 carbons, and within each class the representatives chosen 
spanned the largest possible range of lengths. The line of long 
dashes was fit to all of the data for the alkyl arenes in the data set 
calculated with units of corrected volume fraction. 


results are observed for the standard free energies of 
transfer from water to the pure liquid calculated from 
the solubilities of a similar set of hydrocarbons in 
water.” 

Fourth, both the upper and lower solid lines in 
Figure 5-21 intersect the ordinate so close to the origin 
that the hydrophobic effect represented by these stan- 
dard free energies of transfer is for all practical purposes 
directly proportional to the number of hydrogen-carbon 
bonds a molecule contains. The upper dotted line in 
Figure 5-21 is aline forced to pass through the origin that 
was fit to a representative set of data for acyclic alkanes, 
acyclic monoenes, acyclic dienes and alkynes, alkyl 
arenes, and alkenyl arenes, and this fit is statistically 
indistinguishable from a line fit to the same representa- 
tive set but not forced to pass through the origin (r=0.992 
and r = 0.992, respectively). The line of short dashes in 
Figure 5-21 is a line forced to pass through the origin that 
was fit to a representative set of data for acyclic alkanes, 
acyclic monoenes, acyclic dienes and alkynes, alkyl 
arenes, and alkenyl arenes, and the fit is almost statisti- 
cally indistinguishable from a line fit to the same set of 
representative data but not forced to pass through the 
origin (r= 0.973 and r= 0.977, respectively). 

The mean of the slopes of the four linear fits of the 
data spanning the largest range of hydrogen-carbon 
bonds, those for linear acyclic alkanes (lower solid line 
in Figure 5-21), branched acyclic alkanes, acyclic 
alkenes, and acyclic arenes (lower line of long dashes in 
Figure 5-21), for the standard free energies of transfer 
from water to hexadecane, based on units of corrected 
volume fraction, is -2.80 + 0.08 kJ (mol hydrogen- 
carbon bond)", which agrees with the value given by 
Pace.’ 

The significant advantage of the fact that the mag- 
nitude of the hydrophobic effect exerted upon a mole- 
cule is determined only by its content of 
hydrogen-carbon bonds is that the change in standard 
free energy of transfer [-2.8 kJ (mol hydrogen-carbon 
bond)"] or change in standard heat capacity of transfer 
[30J KT (mol hydrogen-carbon bond)"] due to the 
hydrophobic effect for a structural transformation can be 
estimated simply by counting the number of hydro- 
gen-carbon bonds removed from or inserted into water 
during that transformation. 

The hydrophobic effect can be dissected more 
finely ifthe process of transfer itself is dissected further. 
The most reliable way to measure the transfer of a 
volatile solute between water and another solvent is to 
place a vessel of water and a vessel of that other solvent 
containing the solute into a sealed chamber and allow 
equilibration to occur. In this way, the solute dissolved in 
the water and the solute dissolved in the other solvent 
both come into equilibrium with the vapor of solute that 
fills the sealed chamber. In practice, this experiment dis- 
sects the transfer into the following thermodynamic 
cycle:181194195 
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A (gas) 
A uo A aa? 
(5-64) 
AG uo 


A (H20) A (solventi) 


Ideally, the two standard free energies of transfer 
between the gas phase and the two liquid phases have 
the usual function of measuring the two separate stan- 
dard free energies of solvation. 

The partition coefficients for the transfer of a 
number of alkanes between the gas phase and various 
solvents have been collected.’ The standard free ener- 
gies of transfer calculated from these partition coeffi- 
cients can be presented graphically by plotting the 
standard free energy of transfer for a given solute into a 
given solvent as a function of the standard free energy of 
transfer for that solute into benzene, which can be used 
arbitrarily as a reference solvent (Figure 5-22).!°° The 
solvents hexane, cyclohexane, carbon tetrachloride, 
toluene, phenyl bromide, and phenyl iodide all yield 
standard free energies of transfer between those of 
benzene and decane; the solvents phenyl chloride, 
N-methylpyrrole, 1-octanol, 1-butanol, 1-propanol, and 
ethanol all give standard free energies of transfer inter- 
mediate between those of benzene and methanol; 
and the solvents acetonitrile, propylene carbonate, and 
nitromethane all display free energies of transfer 
between those of methanol and dimethyl sulfoxide. 

The solvation of most hydrophobic solutes by most 
solvents proceeds with a negative change in standard 
free energy, and all of these solvents show clear 
decreases in standard free energy of solvation as the size 
of the solute is increased (Figure 5-22). This is to be 
expected because the van der Waals forces that arise as 
the solute is surrounded by solvent have negative stan- 
dard enthalpies of formation and increase in magnitude 
as the size of the solute increases. Water, however, is the 
clear exception among the solvents examined because it 
displays positive standard free energies of solvation for 
all of the hydrocarbons and these standard free energies 
of solvation increase as the size of the solute increases. 
With the exception of ethylene glycol, which is also 
strongly hydrogen-bonded, the nonaqueous solvent 
showing the least negative standard free energy of trans- 
fer is dimethyl sulfoxide, which has been included in 
Figure 5-22. Even dimethyl sulfoxide, however, fails to 
demonstrate the extreme behavior of water. The differ- 
ence in behavior between water and all of the other sol- 
vents is the hydrophobic effect: the exclusion of 
hydrophobic solutes from aqueous solution. 

Water must participate in van der Waals interac- 
tions with hydrophobic solutes just as the other solvents 
do. Nevertheless, the unfavorable hydrophobic effect 
must be greater in magnitude than these favorable van 
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Figure 5-22: Standard free energy of transfer (kilojoules mole”) 
for a given alkane from the gas phase into a given solvent, plotted 
as a function of the standard free energy of transfer for that alkane 
from the gas phase into benzene.‘ The values tabulated for parti- 
tion coefficients, originally in units of atmosphere for the vapor and 
mole fraction for the solute dissolved in a solvent, were converted 
to moles liter” for the vapor and corrected volume fraction for the 
solute in a solvent (Equation 5-5) by use of estimates of the partial 
molar volumes for the hydrocarbons in water” or Equation 5-9 
for the partial molar volumes of the hydrocarbons in the other sol- 
vents. The standard free energies of transfer from the gas phase 
into the various solvents were then calculated (Equation 5-21). The 
solvents chosen for display were decane (O), benzene (0), 
methanol (©), dimethyl sulfoxide (x), and water (+). The alkanes 
chosen were methane, ethane, propane, n-butane, isobutane, 
n-pentane, neopentane, n-hexane, n-heptane, and n-octane. 
Benzene was chosen as the reference solvent because more values 
for standard free energy of transfer into benzene were available and 
benzene has behavior similar to decane. 


der Waals forces and more than overcomes them. As the 
polarities of the solvents increase, the slopes of the lines 
in Figure 5-22 become less negative. This effect might be 
explained by noting that as the solvents become more 
polar, more standard free energy is required to form a 
cavity within them, and this change is deducted from the 
favorable standard free energy of interaction between 
solute and solvent. In this view, water would simply be 
the extreme example of the difficulty in forming a cavity. 

The solutes chosen for the free energies of transfer 
from the gas phase to the various solvents in Figure 5-22 
were all acyclic alkanes. Further insight into the 
hydrophobic effect is gained when the free energies of 
transfer from the gas phase into hexadecane and from 
the gas phase into water are plotted for linear alkanes, 
branched alkanes, cyclic alkanes, alkenes, alkadienes, 
cyclic alkenes, alkynes, cyclic alkadienes, saturated 
arenes, and unsaturated arenes (Figure 5-23) 175,196 as a 


function of the number of their hydrogen-carbon bonds. 
The difference between each pair of lines, the one for 
transfer from gas to hexadecane and the one for transfer 
from gas to water, is the line for the standard free energy 
of transfer from water to hexadecane for the respective 
class of compounds (Figure 5-21). As the degree of unsat- 
uration increases, each of these pairs of lines in Figure 
5-23 is found at a lower level on the graph [about 
-2.2kJ mol! (level of unsaturation)! for water 
and -2.4 kJ mol" (level of saturation)’ for hexadecane].* 
It is the difference between these two values for the 
incremental free energies of transfer that causes each of 
the lines for the standard free energies of transfer for the 
other classes of hydrocarbons presented in Figure 5-21 
(lower set of data) to fall progressively below the line for 
the linear alkanes. 

With the notable exception of cyclization, when any 
two hydrocarbons are compared that have the same 
number of hydrogen-carbon bonds, the more unsatu- 
rated one will be the larger one, and the larger one should 
display stronger van der Waals interactions with the 
hexadecane. To the extent that the incremental 
decreases in standard free energies of solvation by hexa- 
decane for hydrocarbons with the same number of 
hydrogen-carbon bonds but different levels of unsatura- 
tion (offsets of the lower set of lines in Figure 5-23) rep- 
resent increases in van der Waals interactions, the 
similar incremental decreases observed for transfer of 
the same hydrocarbons into water suggest that water 
also participates in similar van der Waals interactions. 

The standard enthalpies of transfer from the gas 
phase to water for the linear alkanes can be described by 
the relationship!” 


o 
AH AR = 


-17 kJ mol"! - 1.7 kJ (mol hydrogen-carbon bond)! 
(5-65) 


This inclusive, exothermic standard enthalpy of transfer 
must arise from the establishment of van der Waals inter- 
actions between water and the alkane during its entry. If 
so, this also provides evidence that water does partici- 
pate in van der Waals interactions with hydrocarbon. 

In Figure 5-23, the lines for the transfer of the linear 
alkanes from the gas phase into water and from the gas 
phase into hexadecane intersect on the ordinate at +4 kJ 
mol”, in agreement with the intersection of all of the 
lines in Figure 5-22 at +5 kJ mol". In Figure 5-23, this 


* The offsets of the lines for aqueous solutions in Figure 5-23 sug- 
gest that it is the only the degree of unsaturation that is of conse- 
quence in the solvation experienced by hydrocarbons in water, but 
the offsets of the lines for solutions in hexadecane suggest that, 
unlike in water where they are equivalent to only one degree of 
unsaturation, the solvation of a ring in hexadecane is equivalent to 
the solvation of two degrees of unsaturation. 
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intersection corresponds to the standard free energy of 
transfer of a linear alkane with no hydrogen-carbon 
bonds from the gas phase to either condensed phase, 
which is the transfer of nothing. It is reassuring that the 
transfer of nothing from water to hexadecane proceeds 
with no change in standard free energy, but the fact that 
the standard free energy of transfer of nothing from the 
gas phase to either of these condensed phases is +4 kJ 
mol’ suggests that, not surprisingly, there remains some 
difference in standard free energy between the gas phase 
and either condensed phase unaccounted for by the 
choices of standard state. If, as has been stated,” an 
additional term equal to RT (+2.5kJ mol”) must be 
added to all of the standard free energies of transfer, this 
correction would only increase this difference. 

As the amount of unsaturation in the sets of hydro- 
carbons increases, the standard free energies of transfer 
at the respective intersections decrease in value. Each of 
these latter intersections corresponds to the free energy 
of transfer of the unsaturated hydrocarbon in that set 
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Figure 5-23: Standard free energies of transfer from the gas phase 
to water (upper set of lines) and from the gas phase to hexadecane 
(lower set of lines) as a function of the number of hydrogen-carbon 
bonds in a hydrocarbon. It was found that when the values for the 
partition coefficients for transfer from hexadecane to the gas phase 
calculated with Equation 1-10 from the mobility of linear alkanes 
on gas-liquid chromatography with a stationary phase of hexadec- 
ane!” were treated as the quotients of the molarities of the hydro- 
carbons in the two phases and inserted into Equation 5-21, 
standard free energies of transfer resulted that were almost identi- 
cal to those for the transfer of the same hydrocarbons from the gas 
phase to their own liquid, calculated with Equation 5-21 from the 
data of Hine and Mookerjee.!”® Consequently, it was assumed that 
the units on the dimensionless partition coefficients listed by 
Abraham were molarity molarity", and they were inserted as such 
into Equation 5-21 directly to obtain standard free energies of 
transfer from the gas phase to hexadecane for activities in units of 
corrected volume fraction. The standard free energy of transfer for 
each hydrocarbon from the gas phase to water was obtained by 
summing its standard free energy of transfer from the gas phase to 
hexadecane and the standard free energy of transfer from hexadec- 
ane to water (Figure 5-21, lower set of data). The partial molar vol- 
umes for hydrocarbons in hexadecane were calculated as in Figure 
5-21. The classes of hydrocarbons included, in descending order at 
each numerical value for hydrogen-carbon bonds, were branched 
acyclic alkanes (©), linear acyclic alkanes (©), acyclic monoenes 
(+), cyclic alkanes (x), acyclic dienes (A), primary alkynes (V), 
cyclic monoenes (©), cyclic dienes (x), alkyl arenes containing only 
one phenyl ring (0), alkenyl arenes containing only one phenyl ring 
(x), and diphenylmethane (A). The acidic hydrogens on the pri- 
mary alkynes were not counted as hydrogen-carbon bonds for 
transfer to water. The lines drawn are fit to the respective data for 
the linear acyclic alkanes (solid line), the acyclic monoenes (long 
dashes), the primary alkynes (intermediate dashes), and the alkyl 
arenes (short dashes). These four sets contained the largest spreads 
in the number of hydrogen-carbon bonds. 


with no hydrogen-carbon bonds, which would be the 
particular carbon-carbon double bonds alone. These 
intersections decrease monotonically in value as the 
number of carbon-carbon double bonds increases 
because van der Waals interactions are realized when 
each of these carbon-carbon double bonds are trans- 
ferred from the gas phase to either water or hexadecane. 

The slope of the line in Figure 5-23 correlating the 
standard free energies of transfer from the gas phase to 
water for the linear alkanes is +1.45 kJ (mol hydrogen- 
carbon bond)”, that for branched alkanes is +1.48 kJ 
(mol hydrogen-carbon bond)”, and that for arenes is 
+1.50 kJ (mol hydrogen-carbon bond)”. These values, 
which are for hydrocarbons related to the side chains of 
the amino acids, define the magnitude of the active 
exclusion of hydrogen-carbon bonds from water. 

It is the separate solvations dissected in Figure 
5-23, the one accomplished by hexadecane and the one 
accomplished by water, that together further illustrate 
the unique contribution of hydrogen-carbon bonds to 
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the hydrophobic effect. When any two hydrocarbons 
are compared that have the same number of hydrogen- 
carbon bonds, the more unsaturated one will be the 
one with the larger surface area. As the degree of 
unsaturation and hence the surface area increases at a 
constant number of hydrogen-carbon bonds, the stan- 
dard free energy of solvation exerted by the hexadecane 
becomes more negative. As the degree of unsaturation 
and hence the surface area increases at a constant 
number of hydrogen-carbon bonds, the standard free 
energy of solvation exerted by the water becomes more 
negative. As the number of hydrogen-carbon bonds 
and hence the surface area increases at a constant 
degree of unsaturation, the standard free energy of sol- 
vation exerted by the hexadecane becomes more nega- 
tive. In distinct contrast to these three trends, however, 
as the number of hydrogen-carbon bonds increases at 
a constant degree of unsaturation, the standard free 
energy of solvation exerted by the water becomes more 
positive. It is only the water that responds to an 
increase in the surface area of the solute by rejecting it 
more and more strongly but only when that increase in 
the surface area is accomplished by adding hydrogen- 
carbon bonds. 

When the standard free energies of transfer from 
the gas phase to water for an even larger set of organic 
solutes than those displayed in Figure 5-23 are exam- 
ined,'”® small differences in the standard free energy of 
transfer (mole hydrogen-carbon bond)” for different 
types of hydrogen-carbon bond become apparent, and 
these differences have been quantified by defining a 
value for the contribution of each type to the overall 
standard free energy. For pairs of molecules otherwise 
identical except that one contains -CH,CH,- and the 
other -CH(CH;)-, the free energies of transfer from gas 
to water differ by +0.4kJ mol’ (15%), and for pairs of 
molecules, otherwise identical except that one contains 
—CH,CH,CH,- and the other contains -C(CH;)>-, the 
standard free energies of transfer from gas to water 
differ by +0.6 kJ mol (15%). These differences might be 
taken as evidence that branched alkanes are more 
hydrophobic than unbranched alkanes except for the 
fact that when standard free energies of transfer from 
water to hexadecane (Figure 5-21) are examined instead 
of those from gas to water, branched alkanes (© in 
Figure 5-21) are less hydrophobic than unbranched. 
These opposite conclusions bring into focus the unfor- 
tunate fact that currently there are two ways of defining 
the hydrophobic effect: one stressing only solvation by 
water, and the other, transfer from water to hydro- 
carbon. 

There are two significant contributions to the 
hydrophobic effect (Figure 5-23): the active exclusion of 
hydrogen-carbon bonds from water and the solvation of 
those hydrogen-carbon bonds by the new surroundings 
in which they find themselves. If a hydrogen-carbon 
bond is transferred from water to the gas phase, the new 


surroundings, by definition, do not solvate it; only the 
former contribution is expressed, and the hydrophobic 
effect is only the exclusion of the hydrogen-carbon 
bonds from water. If, however, the hydrogen-carbon 
bond is transferred to a condensed phase, such as hexa- 
decane, half of the magnitude of the hydrophobic effect 
is its solvation by the new solvent. The more recent habit 
of equating the hydrophobic effect only with the transfer 
from gas to water" DIR avoids dealing with this half of 
the hydrophobic effect as it was defined tradition- 
ally.” Both views, however, the one emphasizing sol- 
vation and the other transfer, persist.'”'? 

It is hard to argue that transfer from water to gas is 
relevant to biochemical events such as the folding of a 
protein or the association of a substrate or inhibitor with 
an enzyme. In such instances the hydrogen-carbon 
bonds are transferred from water into a condensed phase 
that bears no resemblance to the gas phase. But the con- 
densed phase in such situations does not resemble hexa- 
decane either; rather, it is the interior of the irregular 
solid that is the native protein itself. There is no reason to 
assume that the interior of a molecule of protein behaves 
as if it were an isotropic solvent. 

These considerations bring the argument back to 
the van der Waals forces between the hydrogen-carbon 
bond and its new surroundings. Regardless of whether or 
not water engages in van der Waals interactions with a 
hydrogen-carbon bond that are of the same magnitude 
as those of the new surroundings, the results in Figure 
5-23 suggest that water behaves as though it does not 
engage in van der Waals interactions with a hydrogen- 
carbon bond at all. Once the hydrogen-carbon bond has 
been expelled from water, the standard free energy of its 
transfer is significantly affected by the standard free 
energy of the van der Waals forces between it and its new 
surroundings. As a result, and ironically, it is these van 
der Waals forces that determine much of the magnitude 
of the hydrophobic effect in any particular circumstance, 
and it is the magnitude of the van der Waals force felt by 
a hydrophobic functional group in the interior of the 
folded molecule of protein that significantly affects the 
strength of the hydrophobic effect it is able to exert 
during the folding of the polypeptide. 

When a polypeptide is present in water in its 
unfolded state, the hydrophobic hydrogen-carbon 
bonds scattered along its length are unstable relative to 
any state in which they are in contact with each other and 
out of contact with water. This is the hydrophobic effect 
that drives the process of folding. Because ions and 
donors and acceptors of hydrogen bonds are more stable 
in the hydrated state than in any state in which they are 
isolated from water, even if they are fully joined in ion 
pairs and hydrogen bonds, they cannot provide net 
favorable standard free energy to the process of folding. 
The hydrophobic effect is the only noncovalent force that 
provides net favorable standard free energy to drive the 
folding of a polypeptide. 
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Problem 5-12: The anomalously high isopiestic heat 
capacity C, of liquid water is presumably due to the fact 
that as the liquid is heated, a certain amount of its hydro- 
gen-bonded structure is lost. This extra heat capacity, 
beyond that calculated from vibrations and translations 
of the molecules, is the configurational heat capacity 
(Cl, For water, Cours 30 J K' mol" (T= 273-373 K). 


(A) Calculate Aë: and Ar, the configurational stan- 
dard entropy and standard enthalpy changes, that 
are associated with heating water from 293 to 
313 K. 


It is difficult to say with certainty what fraction of 
the hydrogen bonding is lost over this temperature 
range, but these numbers give a rough estimate of the 
ratio of the changes of standard enthalpy and standard 
entropy (AH/AS) to be expected when the structure of 
water increases or decreases in the liquid. Consider the 
following phase transfer reaction: 


C,H, (H,0) => C,H, (pure phase) 


(B) Ifall the change in standard entropy (Table 5-8) is 
due to a decrease in water structure as the more 
structured regions that surrounded the alkane 
melt, what standard enthalpy change must 
accompany this decrease in water structure? 


(C) What is the contribution in percent to the change 
in standard free energy (AG°) in the above reac- 
tion that results from the melting of structured 
water around the alkane? 


Problem 5-13: Consider the transfer 


reaction: 


following 


R(CH,),OH (H,O) = R(CH,),OH (pure) 


This chemical equation describes the transfer of an alco- 
hol from water to a pure phase. As such it describes the 
tendency of the alcohol to remove itself from water, a 
hydrophobic effect. For any substance A under any cir- 
cumstances there is associated an intrinsic standard free 
energy, or chemical potential (u): 


Pe RTIN{ 64, exp| 1 (E, gll 
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where u, is the chemical potential of solute A under the 
experimental circumstances, u°, is the chemical poten- 
tial of solute A at standard state and unit concentration, 
and pa jand V are defined by Equations 5-6 and 5-8. The 
free energy change for the transfer reaction is 


= = o o 
AG aic,H,0 — alc = Hale 7 Halc,H,0 = Hate" H alc,H,0 ` 


RT'In{ alc, H,O exp|1 z (Taaa Vio )I} 


where Uac is the chemical potential of pure alcohol and 
Hate po İS the chemical potential of the alcohol dissolved 
in water at a concentration of du pn. At equilibrium, 
when alcohol saturates the water phase, AG = 0 and 


o o Sg o = 
H ae" H alc,H,0 7 AG alc,H,0 — alc 7 


RTIn{ dungen exp[1 = | Vaio / Vun 1 


It is easy to measure the concentration of alcohol in 
moles liter” at saturation when H,O and pure alcohol are 
shaken in a two-phase system (separatory funnel) and 
the aqueous phase is clarified and removed. The follow- 
ing results? were obtained at 25 °C: 


alcohol [alcohol] sat 
(mol L 
n-butanol 0.97 
n-pentanol 0.25 
n-hexanol 0.059 
n-heptanol 0.0146 
n-octanol 0.0038 


(A) Change these numbers into 
Palc,H,0,sat exp|1 = Ve / Vino )| 


(B) Calculate AG°ac,H,0—alc for each case, and plot 
At aic,H,O—alc aS a function of the hydrogen-carbon 
bonds (n). 


(C) Determine the slope of your plot, AG°y¢H,0—alc: 
AG aen alc 7 AG Ae +n AG*Hc,H,0 — alc 


(D) The term AG°yc,#,0-alc is the standard free energy 
change associated with the transfer of a hydrogen- 
carbon bond from H,O to pure aliphatic alcohol. Is 
this transfer favored or unfavored? 


(ŒŒ) The term AG°,.isthe standard free energy change 
due to the fact that the molecule you are using is 
an alcohol. Is the transfer of the hydroxyl group 
favored or unfavored? 
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(F) Extrapolate to 
n- propanol. 


determine ` At "denne for 


(G) Determine ` AS ucnn-ae in the equation 
AS ae Dale = Aë + NAS °C,H,0—alc and 
AH °yc,#,0-alce In the equation AH°aicH,o—alc = 
Aa, + NAH°ycH,0-ac from the values in 
Table 5-8. 


(H) Is AH° or AS° the major contributor to the 
hydrophobic effect on a hydrogen-carbon bond 
at 25 °C? 


Problem 5-14: Calculate the standard enthalpy of 
transfer and the standard entropy of transfer of n-butane 
from water to liquid n-butane at 50, 70 and 90 °C. Recall 
that 


dAH = AC, dT 
and 


AC, 
dAS = —— dT 
T 


and assume that all partial volumes and AC", are invari- 
ant with temperature over the range 20-100 °C. 


Problem 5-15: The partition coefficients of N-methylin- 
dole and 3-methylindole 


CH3 
(ol) ; 
N N 
H 


between water and cyclohexane were examined’ to 
investigate the effect of the hydrogen-bond donor in 
tryptophan on its partition between water and the anhy- 
drous interior of a protein. The coefficients for partition 
from water to cyclohexane in the following table are 
expressed in units of molarity and have been extrapo- 
lated to infinite dilution. 


partition coefficient (M M'') 


temperature (K) 3-methylindole ` N-methylindole 
288 19 300 
298 19 290 
308 20 270 
318 20 260 
328 19 230 


because 
di Van 8 Lë 
PA,H,0 Vu Kon, 
l (Al, VA, H,O Van SR 
pl = = = 
(al, Veoh, Vio Ree, 


if (nol Vio)» (c/o and (Vaso! Vaca) do 
not change with temperature 


In (Ak, 
dln Kp [Alno 
ƏT! Jp 

or 


P 


(A) Calculate the standard enthalpy of transfer 
for each compound from the behavior of its 
partition coefficient as a function of the tempera- 
ture. 


(B) If a hydrogen bond were lost every time a 
3-methylindole was transferred from water to 
cyclohexane and no hydrogen bond were lost 
every time N-methylindole was transferred from 
water to cyclohexane, which standard enthalpy of 
transfer should be more negative? 


(C) When a molecule of 3-methylindole leaves water 
for cyclohexane, what is the net change in hydro- 
gen bonding for the entire system? 


(D) Why isn’t the standard enthalpy change for the 
transfer of 3-methylindole more negative than 
that for the transfer of N-methylindole? 


Problem 5-16: The N-acetyl-a-amides of a series of 
amino acids were synthesized, and the partition of 
each of them between l-octanol and water was 
assayed. The measured distribution coefficients for the 
reaction 


N-acetyl-a-amide of amino acid (H,O) = 
N-acetyl-o-amide of amino acid (1-octanol) 


in units of (moles of solute) (liter of water)! (moles of 
solute) (liter of 1-octanol) at 20 °C are presented in the 
following table.” 


distribution coefficient 
mol of solute 


N-acetylamide of 2 
(L of 1-octanol) 


amino acid 
mol of solute (L of water)! 
isoleucine 0.93 
leucine 0.75 
methionine 0.25 
valine 0.24 
alanine 0.030 
threonine 0.027 
serine 0.0135 
glutamine 0.0089 
asparagine 0.0038 


(A) Estimate the partial molar volumes of the 
N-acetylamides in water by using the algorithm of 
Traube. 


Because partial molar volumes in 1-octanol are unavail- 
able, assume that they are equal to those in water + 
13 cm? mol". In all cases, the final concentrations of the 
model compounds in each phase were less than 10 mM. 


(B) Calculate the standard free energy of transfer in 
kilojoules mole ` for each of these model com- 
pounds from water into 1-octanol, AG°s 4,0—octanol- 


(C) Plot AG°,4,0-octano) against the number of 
hydrogen-carbon bonds in each N-acetylamide. 


(D) Draw a line across your plot with a slope of-2.8 kJ 
(mol hydrogen-carbon bonds)! that passes 
through the four points for leucine, isoleucine, 
valine, and alanine on your plot. Why did you do 
this? 


(E) Why don’t the points for glutamine and 
asparagine lie 52 kJ mol” above the line and the 
points for serine and threonine lie 26 kJ mol! 
above the line? 


Hydropathy 


As ionic interactions, hydrogen bonds, and the 
hydrophobic effect are considered in turn, it becomes 
clear that the functional groups participating in each of 
these processes—ionic groups, donors and acceptors of 
hydrogen bonds, and hydrogen-carbon bonds—experi- 
ence strong favorable and unfavorable interactions with 
water. Ionic solutes display large negative standard 
enthalpies of hydration that dominate their behavior in 
water. When these ions are withdrawn from water, large 
investments of free energy must be made to strip the 
shells of hydration from them. Solutes with donors and 
acceptors form hydrogen bonds with water molecules 
that have significant negative free energies of formation 
because of the high molar concentration of the water in 
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the solution. When donors and acceptors of hydrogen 
bonds are withdrawn from water, significant invest- 
ments must be made to counter these free energies. 
Hydrophobic solutes leave aqueous solution with a pref- 
erence for almost any other condensed or uncondensed 
phase because when they are dissolved in water, they 
cannot form any net favorable interactions with it. When 
they are withdrawn from water into another phase, sig- 
nificant favorable changes in free energy are realized. 
Each of these particular outcomes arises from the respec- 
tive changes in the structure of water that accompany the 
transfer of the functional groups between water and a 
nonaqueous phase. 

Viewed in this light, few solutes elicit indifferent 
responses from water. Solutes are either hydrophilic, 
demonstrating a compatability with water, or hydropho- 
bic, demonstrating an incompatability with water. These 
strong responses together are hydropathy. Hydropathy 
is the continuous spectrum from compatibility to incom- 
patibility with water, at one end of which is hydrophilic- 
ity, and at the other, hydrophobicity. 

It was suggested by Hine and Mookerjee'” that the 
hydropathy of a solute could be judged from its standard 
free energy of transfer between water and the gas. 
Assembling values from the tables of Hine and 
Mookerjee!” and providing several previously unmea- 
sured values, Wolfenden, Andersson, Cullis, and 
Southgate’? have tabulated the standard free energies 
of transfer between water and the gas for model com- 
pounds of the side chains of the amino acids in which the 
acarbon has been replaced by hydrogen (Table 5-9). 
These values reflect the magnitudes of the standard free 
energies realized when the various functional groups 
present in the amino acids are removed from water at 
pH 7. As previously noted, the hydrocarbons among the 
side chains are expelled spontaneously from water with 
standard free energies of transfer between about -5 and 
-20 kJ mol”. 

The hydroxyl groups on ethanol and methanol 
increase their respective standard free energies of trans- 
fer from water to the gas phase by +27 kJ mol” relative to 
alkanes of the same number of hydrogen-carbon bonds. 
In part, these unfavorable incremental standard free 
energies of transfer arise from the fact that a net of one 
hydrogen bond is lost to the system when a hydroxyl 
group is removed from water into the gas phase. 
Methanethiol, however, has a standard free energy of 
transfer only +10 kJ mol” greater than an alkane with the 
same number of hydrogen-carbon bonds, while ethyl 
methyl sulfide has a standard free energy of transfer 
+12kJ mol" greater than an alkane with the same 
number of hydrogen-carbon bonds. A comparison of 
these two values with those for ethanol and methanol 
suggests that it is the sulfur that is hydrophilic, not the 
potential hydrogen-bond donor on the methanethiol. 

Propionamide and acetamide have standard free 
energies of transfer +44 kJ mol” greater than alkanes of 
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Table 5-9: Standard Free Energies of Transfer of Model Compounds for the Amino Acids between Water and the Gas Phase 


at 25 °C and pH 7° 


amino model AG*,0—g amino model AG’1,0-g amino model Au 
acid compound (kJ mol) acid compound (kJ mol") acid compound (kJ mol") 
L isobutane -18 C methanethiol +1 K n-butylamine” +28 
I butane -18 W 3-methylindole +10 Q propionamide +32 
V propane -15 Y 4-cresol +14 N acetamide +35 
A methane -10 T ethanol +15 E propionic acid? +35 
F toluene -8 S methanol +18 H 4-methylimidazole” +36 
M ethyl methyl -3 D acetic acid? +40 
sulfide R methylguanidine” +73 


peptide bond N-methylacetamide +34 kJ mol! 


“The values for the standard free energies of transfer from water to the gas phase for the various model compounds were obtained from several tables.!*°!%*°? They were 
usually presented as the transfer of the compound between the standard state of the real gas at infinitely low partial pressure with concentration expressed in atmospheres 
and the standard state of the infinitely dilute solution with concentration expressed in molarity. The units of concentration were changed to moles liter for the gas and 
corrected volume fraction (Equation 5-5) for the solution. The partial molar volumes of the solutes” were calculated” for each solute from the formulas developed by 
Traube.” The standard free energies of transfer from water to the gas phase were calculated with Equation 5-21. ’Values for the pK, of the various amino acids in a polypep- 
tide (Table 2-2) were used to correct the standard free energies of transfer of the neutral compounds’ for the standard free energy of neutralization required at pH 7 


(Equation 5-66). 


the same number of hydrogen-carbon bonds. An argu- 
ment can be made that these large positive standard free 
energies of transfer for the primary amides arise simply 
because each of them has two acceptors and two donors 
so that a net loss to the system of two hydrogen bonds 
occurs upon their transfer to the gas phase. 
Consequently, their standard free energies of transfer 
relative to alkanes of the same number of 
hydrogen-carbon bonds should be about twice those of 
ethanol and methanol, which they are. The standard free 
energies of transfer for acetamide or propionamide are 
about +12 kJ mol! greater than those for neutral acetic 
acid or neutral propionic acid, respectively. The transfer 
of the carboxylic acid on either of these two acids 
involves the net loss of only one hydrogen bond to the 
solution. Although all of these explanations seem rea- 
sonable, they leave unexplained the fact that the stan- 
dard free energy of transfer for N-methylacetamide, the 
model for the peptide bond, is actually greater than that 
for propionamide, which has the same number of hydro- 
gen-carbon bonds, even though a net loss of only one 
hydrogen bond occurs when the former is removed from 
water. It was suggested after the fact that in the case of 
the amides the acceptors may be more important than 
the donors,” and subsequently spectroscopic evidence 
consistent with this suggestion was reported, demon- 
strating that each of the two acceptors on the acyl oxygen 
of N-methylacetamide interacts with water with about 
twice the standard enthalpy as that for the interaction of 
the single donor.!!%?0%206 

The transfers of the heterocyclic side chains are 
dominated by their donors and acceptors of hydrogen 
bonds. 3-Methylindole and 4-cresol have standard free 
energies of transfer +18 and +21 kJ mol”, respectively, 
greater than an arene with the same number of hydro- 
gen-carbon bonds. These increments arise from the 


presence of the pyrrole donor and the hydroxyl, respec- 
tively, that form hydrogen bonds with the donors and 
acceptors of water that must be broken during transfer. 
The standard free energy of transfer for neutral 
4-methylimidazole is +39 kJ mol” greater that that of an 
arene with the same number ofhydrogen-carbon bonds, 
in part because its acceptor (pK, = 7.5) is much stronger 
than that of either 4-cresol (pK, = 10.2) or 3-methylindole 
(pKa = -2) 

In the case of the amino acids that are charged at 
pH 7, such as glutamic acid, aspartic acid, histidine, 
lysine, and arginine, the tabulated partition coefficients 
for transfer of the model compounds between water and 
the gas are for acidic solutions or basic solutions in which 
those model compounds are dissolved entirely as the 
neutral conjugate acids or neutral conjugate bases, 
respectively. At pH 7 only a fraction of the actual amino 
acid will be present as the neutral species, and this will 
decrease the value of the partition coefficient for transfer 
from water to the gas. This decrease in the partition coef- 
ficient can be incorporated into the standard free energy 
of transfer with the formula?” 


A Aen Hunt — g = AGa mog RTina, (5-66) 


where Ay refers to the un-ionized form of the model com- 
pound, Aror is the sum of the neutral form and the 
charged form, and œ; is the fraction that is un-ionized at 
pH 7. The values for œ, were calculated with the values of 
pK, for the amino acids in a polypeptide (Table 2-2) rather 
than with the values of pK, for the model compounds 
themselves. The fraction of un-ionized species, Ga, varies 
from 0.8 for histidine to 10° for arginine. Even the stan- 
dard free energy necessary to neutralize N-methylguani- 
dinium and transfer the neutral compound into the gas 


(+73 kJ mol") is still less than the standard free energy that 
would be required to transfer it into the gas as a cation 
(Figure 5-8). Therefore, all of the tabulated values should 
refer to the most likely reactions, namely, neutralization 
followed by transfer. For example, the only reasonable 
reaction for the transfer of n-butylammonium ion, dis- 
solved in H,O at pH 7, to the gas would be 


© 
“~~ Nh; (aq) 


One of the most striking, and perhaps informative, 
facts about the charged amino acids in a polypeptide is 
that all of them can be neutralized in simple acid-base 
reactions. This is a common property of almost all 
organic cations and anions found in biological systems, 
with the tetraalkylammonium cation among the notable 
exceptions. Nevertheless, it could be argued that this 
latter functional group, which does appear in biological 
settings, has been purposely excluded from among the 
amino acids by natural selection because it cannot be 
neutralized. When charged amino acid side chains are 
transferred to a region with a low relative permittivity, 
such as the gas phase, they will enter as the uncharged 
conjugate acids or conjugate bases. The standard free 
energy required to neutralize the charges of propionic 
acid, acetic acid, and n-butylamine and then remove 
their donors and acceptors for hydrogen bonds from 
water is greater by about +45 to +48 kJ mol than the 
standard free energy of transfer for an alkane with the 
same number of hydrogen-carbon bonds. 

The point emphasized by these experimental 
results is that water has a high affinity for many of the 
side chains of the amino acids, and a good deal of stan- 
dard free energy must be spent whenever they are 
removed from it. This is the complement of the fact that 
the only favorable standard free energy gained from 
removing side chains of the amino acids from an aque- 
ous solution arises from the removal of their hydro- 
gen-carbon bonds from contact with water. At the 
beginning, every amino acid within a newly synthesized, 
unfolded polypeptide is completely solvated by water. 
During the folding of the polypeptide, the formation of 
an interface between subunits, or the binding of a ligand, 
there is a net transfer of side chains of individual amino 
acids, segments of polypeptide backbone, and small 
solutes from water to the interior of a native protein. In 
analogy with Equation 5-64, the first half of the reaction 
can be represented by the standard free energies of 
transfer between water and the gas (Table 5-9). Those 
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solutes removed from water, however, are transferred 
into a new environment, the interior of the protein. The 
standard free energies of transfer into this new environ- 
ment are the second half of the reaction, but the standard 
free energies of transfer into the interior of a protein from 
the gas cannot be predicted with any certainty because 
the interior of a protein is a heterogeneous solid. 

Presumably, noncovalent forces with negative stan- 
dard free energies of formation arise as the amino acids 
and the polypeptide backbone are packed into the inte- 
rior. The standard free energy of transfer for the removal 
of hydrogen-carbon bonds from the aqueous phase is a 
negative change in standard free energy, but a small one 
compared to the favorable hydration of the polar side 
chains (Table 5-9). These hydrogen-carbon bonds, how- 
ever, are being transferred into a new environment that 
should not have the aversion displayed by liquid water 
but should respond with favorable van der Waals forces, 
perhaps resembling in their magnitude the favorable 
negative standard free energies of solvation for nonpolar 
solutes displayed by all solvents other than water (Figure 
5-22). The standard free energies of transfer for the 
removal of the hydrogen-bond donors and acceptors 
from the aqueous phase, especially those ofthe polypep- 
tide backbone itself, are large positive standard free ener- 
gies of transfer (Table 5-9). In the interior of the protein, 
however, these donors and acceptors participate in 
hydrogen bonds the standard free energies of formation 
of which are even more negative than they would be in 
solution because standard entropies of approximation 
are often not required for the associations (Figure 5-19). 
Standard entropies of approximation are at a minimum 
whenever a set of hydrogen bonds forms cooperatively as 
in an o helix or f structure. 

The interior of a properly folded molecule of pro- 
tein is tightly packed in a defined conformation. It thus 
resembles closely the interior of a solid rather than a 
liquid, and the van der Waals forces arising when amino 
acids or segments of polypeptide backbone are trans- 
ferred into it are those that would arise in a solid rather 
than in a liquid. This is a significant difference because in 
solids dipoles and polarizable regions remain fixed in 
their relative orientations rather than being averaged 
over all orientations as they are in a liquid. Even more 
troublesome, when the question of reproducing the 
behavior of this solid is considered, is the fact that the 
interior of a protein is not a systematic solid, such as a 
crystalline or microcrystalline mineral or an amorphous 
glass. Although it is a highly integrated system, shaped by 
natural selection for the performance of a definite func- 
tion, little uniformity can be found in the interior of a 
particular molecule of protein. Because it is the product 
of evolution by natural selection, the interior has all of 
the haphazard character of an acre of woodland. It is 
unlikely that any solvent could reproduce its properties. 

Nevertheless, it has been proposed that the stan- 
dard free energy of transfer of an amino acid, a segment 
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of polypeptide, or a small solute from water to the inte- 
rior of a protein should be similar to the standard free 
energy of transfer for a model compound of that amino 
acid or segment of polypeptide between water and a sol- 
vent the properties of which resemble those of the inte- 
rior of a protein. Scales of hydropathy for the side chains 
of the amino acids based on this proposal have been pre- 
sented. They differ in the personal preferences of their 
proponents for the type of model compounds and the 
particular solvent chosen as the basis of the scale. 

The first of these was the scale of hydrophobicity 
proposed by Nozaki and Tanford,”” which, as its name 
implies, was confined to only one end of the spectrum. It 
relied on the solubilities of the zwitterionic amino acids 
in ethanol that had been previously tabulated by Cohn 
and Edsall.” By subtracting the standard free energy of 
transfer for glycine between water and ethanol from the 
standard free energies of transfer for hydrophobic amino 
acids between water and ethanol, they estimated the 
standard free energies of transfer for the side chains 
alone between water and ethanol. The implication in for- 
mulating a scale of this type is that the interior of a pro- 
tein resembled ethanol in its interaction with 
hydrophobic amino acids, and this may not be far from 
the truth because all nonaqueous solvents display simi- 
lar standard free energies of solvation for hydrophobic 
solutes (Figure 5-22). 

Since this first scale was proposed, at least 35 others 
have appeared,'® and they have usually been expanded 
spectra including all 20 of the amino acids, hydrophilic as 
well as hydrophobic. Most have been based on standard 
free energies of transfer. The original description of the 
hydrophobic effect was based on observations of abnor- 
mal decreases in the surface tension of water that result 
when hydrophobic solutes display a preference for the 
surface of an aqueous solution rather than its interior,” 
and a scale of hydropathy based on the change in the 
surface tension of an aqueous solution with the change 
in concentration of the different amino acids has been 
presented.”” The scale of hydrophobicity based on the 
solubilities of the amino acids in ethanol has been 
expanded to include uncharged, hydrophilic amino 
acids.”® The standard free energies of transfer of the 
model compounds for the amino acids between water 
and the gas (Table 5-9) have also been used to create 
scales of hydropathies.'’**'**” The standard free ener- 
gies of transfer of various solutes between water and 
1-octanol have been proposed as the parameters for a 
general scale for the hydrophobic effect.*!! The standard 
free energies of transfer of the N-acetyl-a-amides of each 
of the amino acids (Figure 2-1) between water and 
l-octanol have been determined, and they have been 
used to construct a scale of hydropathies.*” It has been 
proposed that N-cyclohexyl-2-pyrrolidone would be a 
better solvent to use as reference for standard free ener- 
gies of transfer into the interior ofa protein, and standard 
free energies of transfer for the amino acids between 


water and N-cyclohexyl-2-pyrrolidone have been meas- 
ured and used to construct a scale of hydropathies.””” 

In competition with these scales ofhydropathy based 
on standard free energies of transfer are scales derived 
from the locations of the various amino acid side chains 
in crystallographic molecular models of native proteins. 
The logic in this case is that the purpose of all of these 
scales is to estimate contributions due to changes in sol- 
vation during the folding of a polypeptide and the degree 
with which particular amino acids are buried in the inte- 
rior or exposed to the solvent should directly indicate how 
hydrophobic or hydrophilic, respectively, they are. In these 
computations, the surface area of each amino acid that is 
accessible to water” in a set of crystallographic molec- 
ular models is individually determined. These individual 
accessibilities to water are then grouped by amino acid, 
and average accessibilities for each amino acid are calcu- 
lated. The uncertainty in these calculations is in the cal- 
culation of these averages, and the three scales of 
hydropathy based on the accessible surface area in molec- 
ular models of folded polypeptides”'°*!” are not equiva- 
lent, even though they are based on similar raw data. 

Finally, there are the scales of hydropathies for the 
amino acids that are based on mixtures of the pure scales 
discussed so far. In one case ZÜ a scale based on the acces- 
sible surface area of amino acids in crystallographic 
models was modified by a theoretical calculation of the 
standard free energy required to break hydrogen bonds 
and neutralize charge. In another,” the standard free ener- 
gies of transfer between water and the gas and a tabulation 
of accessible surface areas were combined with personal 
preference to produce a scale of hydropathies. In a third 
case,” a consensus scale of hydropathies was inferred 
from two scales based on standard free energy of transfer 
and three scales based on accessible surface area in folded 
polypeptides. In a fourth case,” a correlation between 
accessible surface area and the hydrophobic effect, the 
standard free energy required to neutralize charged amino 
acids (Equation 5-66), and semi-empirical estimates of the 
standard free energy for withdrawing each individual 
hydrogen-bond donor and acceptor from water were com- 
bined to obtain a scale of estimated standard free energies 
of transfer for each of the amino acids, when located in an 
æ helix, from water to a phase of hydrocarbon. 

At low resolution all of these scales are similar to 
each other. The amino acids the side chains of which are 
alkanes, namely, leucine, isoleucine, and valine, are the 
most hydrophobic amino acids; the charged amino acids 
the pK, of which is farthest from pH 7, arginine, lysine, 
glutamate, and aspartate, are the most hydrophilic; and 
neutral but polar amino acids such as serine and threo- 
nine reside in the middle; but the details of the ranking 
and the relative magnitudes of the parameters are dra- 
matically different DI At the moment, each of these 
attempts to estimate the standard free energy of transfer 
for each of the amino acids between water and the inte- 
rior of a protein has its particular proponents, some 


more forceful than others, and there is no unambiguous 
way to choose among them or assess whether any of 
them is realistic or unrealistic. 

The usual criterion for the reliability of each scale is 
to demonstrate either that it correlates with the distribu- 
tion of amino acids between the surface and the interior 
of a protein, if it is based on standard free energy of trans- 
fer (Figure 5-24),'%°°*?!® or that it correlates with stan- 
dard free energies of transfer, if it is based on the 
distribution of amino acids between the surface and the 
interior.”'° None of these correlations suggest that any 
one of the scales is more realistic than any of the others. 
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Problem 5-17: Consider a saturated solution of the 
solute A in solvent j. In this case, the solution of A is in 


equilibrium with solid A and 


Ha sat, j = HA solid 


Hai + RTIn{ Pa sat, j exp| 1 - Kë Gill = Ls solid 


AW — interior (kJ mol a 


15 10 5 0 -5 
AG°y,0— octanol (kJ mol ) 


Figure 5-24: Correlation between the standard free energies of 
transfer of N-acetyl-a-amides of the amino acids between octanol 
and water with the degree to which the amino acids are buried in 
the interior of a molecule of protein.?” The partition coefficients 
for the distribution of the N-acetyl-a-amides of the 20 amino acids 
between water at pH 7 and octanol at room temperature were 
measured (concentrations in molarity) and standard free energies 
of transfer, AG°,.,H,0--octano)» Were Calculated. Each of the 5220 amino 
acids in the crystallographic molecular models of 22 proteins was 
identified as either buried (less than 0.2 nm? of accessible surface 
area) or accessible to water (greater than 0.2 nm? of accessible sur- 
face area).”'° For each type of amino acid a partition ratio [number 
buried (number accessible)~'] was calculated, and from this parti- 
tion ratio, a standard free energy of transfer, AG°,.,H,0--interior WAS 
calculated. Adapted with permission from ref 202. Copyright 1983 
Elsevier. 
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If one wishes to compare two different solvents and their 
effects on A 


Haut RTIn{ Pasat,1 exp| 1 = (il vı )I} = H’ solid 
ae + RTIn! Pa sat2 exp| 1 = (Taa V, 1 = Hs solid 


] sat,1 R 


[A 
AG’ = RTIn ex = - 
we [A] sat,2 j V3 Wéi 


where u°, ; is the chemical potential of A in solvent 1 ata 
concentration of 1 corrected volume fraction, da sat 2 is the 
concentration of A in a saturated solution in solvent 2 in 
units of volume fraction, and AG°, 1—2 is the standard free 
energy change when A is transferred from solvent 1 to 
solvent 2. The quantity AG’... is a measure of the 
change in standard free energy for the following type of 
reaction: 


amino acid (H,O) == amino acid (EtOH) 


where ethanol serves as a model for the interior of a pro- 
tein. 
The following data 27" are for 25 °C 


concn at saturation partial molar 


[g (100 g of solvent) "] volume 
amino acid H,O EtOH (mL mal 
glycine 25.16 0.00382 43.5 
leucine 2.17 0.0196 108 


(A) Calculate ¢y satj exp[(1 - Va, V,)] for these four sit- 
uations. Assume V 4,0 = Va rion 


(B) Calculate AG°, un. for glycine and leucine. 


(C) Use the value you have for glycine to subtract 
away the contribution of "OOCCH;NH'; to the 
solubility of leucine. The remainder is an estimate 
of the standard free energy of transfer of the 
leucine side chain from H,O to ethanol. 


(D) Draw the structure of the glutamine side chain 
and divide it into hydrophobic or hydrogen- 
bonding regions. Label each region on your draw- 
ing and indicate all hydrogen-bond donors and 
acceptors with D or A, respectively. 


(E) Estimate the AG®°guiamine,H,o—ethanol Contributed 
only by the hydrogen-carbon bonds of the side 
chain. 
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(F) 


(G) 


Noncovalent Forces 


The solubility of glutamine (concentration at sat- 
uration) in water is 4.6 g (100 g of H,O)”, the sol- 
ubility of glutamine in ethanol is 4.59 x 10“ g 
(100 g of ethanol)", and the partial molar volume 
of glutamine in water is 96.5 mL mol’. Calculate 
AG*transfer,H,O—ethanol Of the glutamine side chain. 


Estimate AG’ ransfer,H,0—ethano! for the -CONH), func- 
tional group of glutamine. 


Problem 5-18: Consider the following table: 


A 0.25 G 0.16 P —0.07 

R -1.76 H -0.40 S —0.26 

N —0.64 I 0.73 T —0.18 

D -0.72 L 0.53 W 0.37 

C 0.04 K -1.10 Y 0.02 

E -0.62 M 0.26 V 0.54 

Q -0.69 F 0.61 

(A) What are the letters and what is the intention of 
assigning these numbers to these letters? 

(B) What common property is shared by the letters 
with the positive numbers? 

(C) What common property is shared by the letters 
with the negative numbers? 

(D) Divide the letters with the positive numbers into 
two groups based on differences in chemical 
properties. Why are the numbers in one of these 
groups less positive than the numbers in the 
other? 

(Œ) On what types of measurements could the num- 
bers assigned to the letters be based? 

(F) Draw the interactions with water that are one of 
the reasons that R has a value of -1.76. There are 
two reasons that R has such a low value: the inter- 
actions you have just drawn and another of its 
properties. What are these two reasons? 
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Chapter 6 


Atomic Details 


It is within the crystallographic molecular model of a 
protein that the consequences of the noncovalent forces 
just described can be viewed. As the polypeptide folded 
to form the actual native structure of the protein that is 
represented by the model, a significant fraction of its 
backbone had to be withdrawn from water. Secondary 
structure formed because it offered an efficient way to 
maintain the total number of hydrogen bonds in the 
solution. As the polypeptide folded, the donors and 
acceptors of hydrogen bonds in the side chains of the 
amino acids, as well as any ionized side chains, 
remained, by and large, on the surface of the structure 
to maintain their hydration. The networks of hydrogen- 
bonded water molecules hydrating the surface of the 
folded structure are prominent features of the crystallo- 
graphic molecular model. The core of the model is 
formed mostly from hydrophobic amino acids, the 
exclusion of which from the water drove the folding 
process. 

As remarkable as these energetic consequences 
are in a crystallographic molecular model, it has 
become clear upon close inspection that they do not, 
except indirectly, produce the final structure. The most 
important determinant of the final native structure is 
the steric effect. In retrospect, this should not be so 
surprising. Steric effects are always the most over- 
whelming among the different forces influencing the 
outcome of a chemical reaction. They are rarely dis- 
cussed at great length because they are so easy to 
understand. No two fragments of matter may occupy 
the same place at the same time. The folding of a 
polypeptide, however, is a steric nightmare. Not only 
must all of the functional groups fit together in a con- 
fined space without overlapping, but all of the func- 
tional groups are connected together by the 
polypeptide. Although the outcome of any one of these 
games of packing atoms cannot be predicted, the ulti- 
mate solutions to the vast array of steric problems 
encountered during each game can be appreciated by 
examining the final native structure. Both the steric 
effects operating along the polypeptide backbone and 
those engendered between the side chains as the ele- 
ments of secondary structure attempt to fit together to 
produce the final native structure of the protein are 
represented by the crystallographic molecular model. 
These steric effects and the noncovalent forces are the 
players in each of the games. 


Secondary Structure of the Polypeptide 
Backbone 


Because of its molecular orbital system (Figure 2-3), 
the amide of the peptide bond is planar. The dihedral 
angle assigned to this amide is defined by looking 
down the carbon-nitrogen bond from carbon to 
nitrogen: 


R H H H Cy 
N N Xy N En OI 
BO HR 
A trans 9 Ca 
6-1 6-2 


The sign of the angle is determined by the right-hand 
rule. In trans peptide bonds, the angle wis 180°, and the 
two strands of polypeptide leaving the amide depart in 
opposite directions from the carbon-nitrogen bond. In 
cis peptide bonds (6-3), the angle œ is 0°, and the two 
a carbons of the two departing strands are eclipsed. Ina 
protein, at any particular location in the amino acid 
sequence, the peptide bond is either trans in every mol- 
ecule of that particular protein or cis in every molecule of 
that particular protein. The reason for this is that the 
stereochemical difference between these two conform- 
ers is substantial, and only one will be compatible with 
the structure at that location in the folded polypeptide. 

In the available crystallographic molecular models, 
only 0.3% of the peptide bonds' are cis peptide bonds 
(Figure 6-1),” and most of the locations where a cis pep- 
tide bond is found (87%) have proline as the amino acid 
on the carboxy-terminal side! 


252 


Atomic Details 


ei 
= 
n 
Q 
Le 
ai 
We 
pr 
a 
EA 
@ 


g 
Di 
ke 
S 
O 
z 
o 
ad 
bei 
E 
D 
R 
3 
jo} 
a 
ED, 
Ce 
4 
a 
Sr 
eh 
i] 
: 
DO 
z 
8 
n 
ke 
5 
io} 
ech 
E 
CH 
ad 
a 
z 
a 


-o]Je}sAI9 au PNYSUOID 0} SIoydeizogpelsAra əy} Aq pasn 
saouanbas proe ourure Areuruntjaid at) Jo ƏSOY} WOLF IOyIp 
ualjo slaquinu əsə, ‘(810‘Asedxa’sn) soneuliojulorg jo 
aınınsu] SSMS ay Aq payst[qnd aseq erep ay) ur soauonbas 


pre ourwe əy} ur suOTIsOd ma 0} Zurpıoage paiaquinu 
are Sp ouwe au} ‘MOJ[OF Fey} Əsoy} pue andy sty) UJ 
"Araanaadsaı ‘(6g ayerredsy) a1pupdep Ue pue (ZZZ IUDA) 
auDÄDG e Aq PaIdn390 st puog s12 ay} Ul suTfoId əy} Aq Gap) 


Ayyeutiou uonIsod ay} ‘om Jato ƏY} Uy ‘(8 auT[oIg) autfoid 
e sUTEJUOI asay} JO IUO ATUGC ‘IoyJo y9e9 TEU (suIOye pue 


spuog 2210 YIM Dann) spuog əpndəd sı9 vary} aAey 


022 


0744 


0} suaddey urs}01d sry} ‘soueyp Ag „vrofsmduns viuoffity 
Jo AI UND] Jo (wu 07'0 = Supds B8eIg) port emaajour 


orydeısoppelsAim əy} ur spuoq apndad $1 :I-9 aındıy 


(6-1) 


In such peptide bonds, the proline provides the amido 
nitrogen; consequently, the amide formed is secondary. 
In this secondary amide, there is little preference for the 
trans stereochemistry because both positions on the 
amido nitrogen are sterically similar, and the equilibrium 
constants between the cis and trans conformations for 
peptide bonds involving proline are between 0.1 and 1.° 
In proteins, about 6% ofthe peptide bonds in which pro- 
line provides the amido nitrogen are cis peptide bonds.’ 
Unlike proline, the other amino acids form primary 
amides. Because the hydrogen of a primary amide is 
much smaller than the alkyl substituent, the equilibrium 
constant heavily favors the trans form. cis Peptide bonds 
in which an amino acid other than proline provides the 
amido nitrogen are rare (0.04% of all peptide bonds). 
These are presumably locations where a cis peptide bond 
is unavoidable, but evolution by natural selection has not 
yet replaced the carboxy-terminal amino acid of that 
peptide bond with a proline. Other than proline at its car- 
boxy-terminal side, there seems to be little preference for 
the other amino acids at either location in a cis peptide 
bond." In the rare instances in which there is a cystine 
between two adjacent cysteines, the peptide bond 
between them is necessarily cis. 

Aside from the occasional situations in which cis 
peptide bonds occur, the peptide bonds in crystallo- 
graphic molecular models are trans. Ina set of eight crys- 
tallographic molecular models,‘ all built from data sets to 
Bragg spacings of less than 0.12 nm, the values of the 
angle o equal 179° + 6°. Observations from a much 
larger set of crystallographic molecular models give the 
same result.’ Deviations from 180° as large as 15° have 
been observed.”® Were the amide, however, to be dis- 
torted too far from planarity by the structure of the pro- 
tein, it would become susceptible to rapid hydrolysis.’ 

It was noted by Ramachandran, Ramakrishnan, 
and Sasisekharan,'° before crystallographic molecular 
models of atomic resolution became available, that there 
are severe steric effects hindering rotation about the 
single bonds of a polypeptide. The dihedral angle of the 
bond between an «carbon and the adjacent amido 
nitrogen is designated as d and that of the bond between 
an a carbon and the adjacent acyl carbon, as y (Figures 
6-2 and 6-3).!° Each amino acid in the protein is assigned 
a dihedral angle @ and a dihedral angle y associated with 
its æ carbon. Although it has been noted that it would be 
more reasonable to assign values of the dihedral angles @ 
and y to each peptide bond rather than to each amino 
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Figure 6-2: Designation of the two dihedral angles a and y to the 
amino-terminal and carboxy-terminal sides, respectively, of the 
a carbon (Ca) of an amino acid in a polypeptide.” The first carbon 
of a side chain (Cf) corresponds to that of amino acids in the L-con- 
figuration. Adapted with permission from ref 10. Copyright 1963 
Academic Press. 
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acid,” this suggestion has yet to be adopted. The signs of 
the dihedral angles ¢ and y are determined by the right- 
hand rule (Figure 6-3). To follow what is about to be 
described, you should build a model of the structure 
shown in Figure 6-2. 

When the view from an o carbon down the bond to 
an amido nitrogen is observed, it can be seen that in every 
trans peptide bond, the acyl oxygen O1 of the previous 
amino acid in the polypeptide leans forward (Figure 
6-3B,C). When the dihedral angle éis greater than +310° 
(-50°), the amido N2-C2-O2 collides with acyl oxygen Ol, 
and when the dihedral angle & is less than +180°, the 
H-CPß-Cy of the side chain collides with acyl oxygen Ol. 
Therefore, values of dihedral angle ọ greater than +310° 
(-50°) and less than +180°, with the exception of a small 
gap at angle d of +60° (Figure 6-3C), are forbidden. 

When the view from an o carbon down the bond to 
an acyl carbon is observed, it is clear that in all trans pep- 
tide bonds the hydrogen H2 on the amido nitrogen of the 
next amino acid leans forward (Figure 6-3D,E). When the 
dihedral angle y is greater than +200° (-160°), the 
H-Cf-Cyof the side chain collides with this hydrogen, and 
when dihedral angle wis between +300° (-60°) and +30°, 
the amido N1-C1-01 collides with this hydrogen (Figure 
6-3D,E). The latter collision is not a serious one so long as 
the value of dihedral angle d remains around 270° (-90°) 
or +90° so the amido N1-C1-O1 can squeeze past the 
hydrogen H2 sideways (Figure 6-3C,F), but values for dihe- 
dral angle a of +90° are only permitted for glycine, which 
lacks a ß carbon. When dihedral angle yis between +290° 


Figure 6-3: Definitions of the dihedral angles d and wand the steric 
effects of rotation. (A) Pattern in which the bonds with the dihedral 
angles d and ware distributed along a polypeptide. (B) View down the 
bond between Ca and the amido nitrogen N1 that precedes it along 
the polypeptide. The dihedral angle ¢ is defined as that between the 
bond connecting N1 and C1 and the bond connecting Ca and C2 
(Figure 6-2). Its sign is determined by the right-hand rule. Note that 
the direction of the arrow is irrelevant to the assignment of the sign of 
the angle. In the configuration shown, angle déis +260° (-100°). This 
dihedral angle is in the most sterically free range for angle d (-45° to 
-180°) because in this range the hydrogen on Co can slip under the 
acyl oxygen O1. (C) View down the same bond as in panel B but with 
angle @ at +90°, produced from the conformation in panel B by rota- 
tion about only the bond on the axis of the view. When angle ġis +60°, 
the acyl oxygen, O1, sits between the carbon of the next peptide bond 
at C2 and the first carbon of the side chain, Cf. This would be the 
value of angle din a left-handed o helix. (D) The same conformation 
presented in panel B is viewed along the bond between Ca and the 
acyl carbon C2. The eyes indicate the views interconverting panels B 
and D. The dihedral angle wis defined as that between the bond con- 
necting Ca and N1 and the bond connecting C2 and N2 (Figure 6-2). 
Its sign is determined by the right-hand rule. The configuration 
shown (y=+105°) is in the most sterically free range for angle y (+15° 
to +190°) because the hydrogen on Co can slip below H2. (E) View 
down the same bond as in panel D but with angle wat +285° (-75°). 
When angle wis +300° (-60°), H2 lies between the first carbon of the 
side chain, Cf, and the nitrogen of the amino-terminal peptide bond, 
N1. This is near the value of angle y (-39°) in a right-handed œ helix. 
(F) Steric effect between H2 and either N1 or Hl that occurs when 
angle wis near 0°. 
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(-70°) and+320° (-40°), hydrogen H2 on the amido nitro- 
gen N2 can fit between amido nitrogen N1 and the side 
chain with little difficulty (Figure 6-3E). 

All of these steric effects can be summarized in a 
Ramachandran plot (Figure 6-4A)." Using your model, 
you should verify the noted boundaries on the plot. 

Refinements of crystallographic molecular models 
by use of Equation 4-15 for calculation of the function 6 
usually do not constrain the values for dihedral angles @ 
and y. Even though they are not constrained, however, 
their values converge upon the allowed regions in a 
Ramachandran plot during the refinement. For example, 
although many of the values for dihedral angles d and y 
for the various amino acids along the polypeptide in the 
initial molecular model of deoxyribonuclease I were 
scattered beyond the allowed regions in a 
Ramachandran plot before refinement was performed 
(Figure 6-5A), they clustered within the enclosures after 
the refinement had been completed (Figure 6-5B).° 
Because this convergence was not enforced by the choice 
of (dag = deq) in Equation 4-15, its occurrence can be 
used as evidence that the refined structure is closer to 
reality than the unrefined. 


y (degrees) 


@ (degrees) 


When the dihedral angles for the amino acids in 
eight crystallographic molecular models, all built from 
data sets to Bragg spacings of less than 0.12 nm, are plot- 
ted on a Ramachandran plot (Figure 6-4B),° the points 
themselves define what should be the actual regions of 
lowest energy. It might have been the case that the three 
clusters of open squares in Figure 6-4B are more the 
result of preferences enforced by secondary structures 
than the steric effects first pointed out by Ramachandran 
(Figure 6-4A). When, however, dihedral angles are plot- 
ted for amino acids not involved in secondary structures, 
from a much larger collection of crystallographic molec- 
ular models (402) but from data sets gathered to mini- 
mum Bragg spacings of only 0.2nm or less, the 
distribution still shows the same three clusters with the 
same shapes and extents.'*’* Consequently, the extent 
and magnitude of the actual steric effects in the back- 
bone of a polypeptide are delineated in the distribution 
of the points in a plot such as that in Figure 6-AB. 

With the exception of the region where dihedral 
angle d is between -70° and -180° and dihedral angle w 
is between 30° and 110°, which should be sterically 
unhindered anyway, almost all of the amino acids found 
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Figure 6-4: Ramachandran plot. (A) Diagram illustrating the steric effects producing the Ramachandran plot.'” The two dimensions of the 
plot are the dihedral angles wand @ (Figure 6-3). Boundaries are drawn between allowed and forbidden regions obtained from a molecular 
model in which each atom is a hard sphere of the appropriate van der Waals radius. The clashing atoms are identified on the forbidden side 
of the boundary with the same numbering system as in Figures 6-2 and 6-3. There are only four allowed regions: the large region including 
the values for parallel (®) and antiparallel (@) £ sheet, the region including the values for right-handed o helix (©), the region including the 
values for left-handed o helix (D), and a small triangle at ¢=+60° and y=+180°. The clashes can be understood by referring to Figure 6-3. 
For example, if ¢=-100° and y=+105° (Figure 6-3B,D) and angle ¢ is increased to -45°, O1 clashes with C2; if angle wis decreased to +20°, 
N1 clashes with H2. If ¢=-60° and w=-60° and angle dis decreased to -185°, Ol runs into Cf; if angle yis decreased to -70°, H2 runs into 
CP. Adapted with permission from ref 12. Copyright 1977 Journal of Biological Chemistry. (B) Dihedral angles y and ¢ from eight crystallo- 
graphic molecular models of high accuracy.® The crystallographic molecular models and the minimum Bragg spacings of their data sets were 
cytochrome cs (0.12 nm), cutinase (0.10 nm), lysozyme (0.0925 nm), a fragment of protein G (0.11 nm), ribonuclease Sa (0.12 nm), repressor 
of primer protein (0.11 nm), rubredoxin from Desulfovibrio vulgaris (0.092 nm), and rubredoxin from Clostridium pasteurianum (0.11 nm). 
The numbers in the points indicate the respective models. Triangles are glycines, and squares are amino acids other than glycine. 
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Figure 6-5: Effect of refinement on the values of @ and y for the amino acids in bovine deoxyribonuclease LI Each x in one of the diagrams 
represents the values for the dihedral angles @ and y of one of the amino acids in a crystallographic molecular model of the protein. The 
boundaries in the Ramachandran plot are defined by the steric effects represented diagrammatically in Figure 6-4A. Unbroken lines sur- 
round regions of no hindrance; broken lines, regions of little hindrance. (A) Unrefined, initial molecular model. (B) Refined, final molecular 
model. Glycines are denoted by open circles, cystines by filled squares, and all other amino acids by x. Reprinted with permission from ref 8. 
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outside of the three clusters of open squares in Figure 
6-4B are glycines. In addition to being able to reside in 
cramped locations, glycine lacks a £ carbon, and all of the 
steric collisions involving the £ carbon (Figures 6-3 and 
6-4A) are irrelevant. Therefore, the dihedral angles 
around a glycine have a larger compass. In particular, the 
regions of the Ramachandran plot in which dihedral 
angle ¢ lies between +70° and +180° or dihedral angle w 
lies between -70° and -180° represent areas where 
either O1 or H2, respectively, clash with the side chain of 
any amino acid other than glycine. In Figure 6-4B, the 
points for glycine (A) and the points for all other amino 
acids (O) define distinct regions on the plot. In fact, most 
of the glycines in crystallographic molecular models have 
dihedral angles d and y outside of the traditional enclo- 
sures on a Ramachandran plot defined by the dihedral 
angles d and yof the other amino acids.'”'° This fact sug- 
gests that glycine is selected for situations in which such 
otherwise unpermitted dihedral angles are unavoidable. 

It has been noted that when an amino acid other 
than glycine has angles dand youtside of the enclosures, 
that amino acid is usually involved intimately in the 
function of the protein.’ For example, Serine 120 is the 
nucleophile in the active site of cutinase, and Alanine 30 
is in the center of the crucial tight turn between the two 
a helices in the repressor of the primer protein.® 

The region of the Ramachandran plot bounded by 
values of dihedral angle ọ between -140° and -60° and 
values of dihedral angle w between -20° and +20° was 
originally predicted to be disallowed because when these 
dihedral angles are within these boundaries, either atom 


N1 or atom Hl should be overlapping atom H2 (Figure 
6-4A). Nevertheless, the dihedral angles of many amino 
acids fall within this supposedly disallowed region 
(Figure 6-4B). One way the overlap between either atom 
N1 or atom Hl and atom H2 can be prevented is to 
increase the bond angle N1-Co-C2 beyond the usual 
109.5° of a carbon hybridized sp’. In crystallographic 
molecular models this angle is observed to be wider” 
than expected, with a mean of 112° and deviations up to 
120°. In addition, this widening is dependent on the 
values for the dihedral angles@ and wy. The angle 
N1-Ca-C2 is equal to 109° for p structure, the dihedral 
angles y and é of which fall in the largest unhindered 
area of the plot, but is wider by 3° in or helices,'* the dihe- 
dral angles wand @ of which fall within the lower left clus- 
ter in Figure 6-4B immediately adjacent to the 
supposedly disallowed region. All of these results suggest 
that the existence of so many amino acids the dihedral 
angles d and y of which fall within a region of the 
Ramachandran plot originally predicted to be disallowed 
results from a widening of this bond angle to accommo- 
date the steric effect. The same argument would apply to 
the glycines the dihedral angles @ and w of which fall 
within the boundaries +60° to +110° and -20° to +20°, 
respectively, also previously thought to be disallowed. 
In a set of eight crystallographic molecular models, 
all built from data sets to Bragg spacings of less than 
0.12 nm,° the amino acids within right-handed o helices 
(Figure 6-6)!” have dihedral angles of ¢=-66° + 13° and 
w=-39° + 10° (Figure 6-3E), and these values fall within 
one of the enclosures in a Ramachandran plot (Figure 
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Figure 6-6: a Helix from the crystallographic molecular model 
(Bragg spacing = 0.15nm) of cytochromec from Thunnus 
alalunga.'® Only the peptide backbone and the Bcarbons are 
included in the figure. The dihedral angles @ = -66° (Figure 6-3B 
turned 34° further) and y=-39° (Figure 6-3E turned 36° further) of 
an o helix are most easily observed at amino acids 99 and 96, respec- 
tively. The hydrogen bonds are between the acyl oxygen of amino 
acid i and the amido nitrogen of amino acid į + 4. Compare this 
actual œ helix with the one drawn in Figure 4-16A by pushing over 
the pages in between. This drawing was produced with MolScript.°” 


6-4A). This location suggests that there are no severe 
steric problems along the polypeptide in a right-handed 
æ helix. A left-handed œ helix of L-amino acids would 
have dihedral angles of ¢=+65° and y=+40°, also within 
one of the enclosures (Figure 6-4A). This latter enclosure, 
however, is a small one arising from the conformation in 
which acyl oxygen O1 fits between the side chain and C2 
and O2 of the next acyl group (Figure 6-3C). In crystallo- 
graphic molecular models, about 2% of the amino acids 
other than glycine usually have dihedral angles @ and y 
clustered around this enclosure (Figure 6-4B). This fact 
demonstrates that these are accessible conformations, 
yet left-handed awhelices are almost” never seen. It may 
be that an extended sequence of such conformers, which 
would be required to form a left-handed o helix, in con- 
trast to the few isolated examples that are observed, 
would be sterically unstable. 

The original right-handed «helix (Figure 4-16A), 
built before crystallographic structures were available, 
was constructed with 3.69 amino acids for every turn, 
which would have produced a rotational angle for each 
amino acid of 98° and a rise of 0.147 nm for each amino 


acid.*' In crystallographic molecular models of pro- 
teins,”” in which these dimensions were not con- 
strained, right-handed o helices display rotational angles 
for each amino acid of 99° + 7° and a rise for each amino 
acid of 0.15 + 0.02 nm. 

The paradigm of an «helix is a linear rod such as 
the one from cytochrome c in Figure 6-6, but only about 
15% of those found in crystallographic molecular models 
of proteins are straight enough to fit the paradigm.” A 
significant proportion (60%) of ahelices are smoothly 
curved. The original molecular model of the o helix was 
built so that each acyl oxygen along the a helix partici- 
pates as an acceptor to only one hydrogen bond, namely, 
the one in which the nitrogen-hydrogen bond from the 
appropriate amide was the donor (Figure 4-16A). The 
tacit assumption was that the hydrogen bond would be 
one in which the nitrogen-hydrogen bond would pivot 
on the lone pairs of the acyl oxygen (Figure 5-10D). The 
geometric constraints of the œ helix itself are such that 
the carbon-oxygen-nitrogen bond angle” is about 155° 
+ 10° (Figure 6-6) rather than 180°. This causes the 
carbon-oxygen double bond to tilt outward” from the 
axis of an a helix (Figure 6-6). This tilt permits the oxygen 
to form a second hydrogen bond with a donor on a side 
chain of the protein (Figure 6-7A)**”® or a molecule of 
water (Figure 6-7B).”’ This second hydrogen bond can be 
detected crystallographically (Figure 6-7B) or by a 
decrease in the frequency of the infrared absorption of 
the peptide bond.” 

The formation of this second hydrogen bond 
changes the carbon-oxygen-nitrogen bond angle by less 
than 10° % and this fact suggests that each acyl oxygen 
in an o helix presents a second acceptor whether or not 
it is occupied. The occupation of these second acceptors, 
however, by donors located on only one side of the 
œ helix, for example, the side facing the solvent (the right 
side of the o helix in Figure 6-7B), is thought to be a suf- 
ficient perturbation to cause the o helix to curve.” One 
type of donor that can occupy these second acceptors on 
the acyl oxygens in an g helix are the hydroxyl groups on 
serines and threonines. 

It has been observed that when an o helix contains 
a serine or threonine, the hydroxyl on the side chain has 
a tendency to be located in a position similar to that 
occupied by one of the waters in Figure 6-7B with respect 
to the acyl oxygens on the amino acid three or four posi- 
tions ahead of that serine in the o helix (Figure 6-7A).”° 
The proton on the hydroxyl group of the serine acts as a 
donor in a second hydrogen bond to one of these acyl 
oxygens just as a molecule of water does in the other sit- 
uation. Threonine 35 in the o helix in Figure 6-7B partic- 
ipates in such a hydrogen bond with the acyl oxygen of 
Phenylalanine 31. This hydrogen bond occupies the 
donor on Threonine 35, which would otherwise be diffi- 
cult to bury, and consequently turns its side chain from 
an apathetic one (Table 5-9) into a hydrophobic one, 
suitable to its surroundings. If Alanine 29 in Figure 6-7B 
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tor on the acyl oxygen of a peptide bond in an «helix by the 


hydroxyl of a serine (A) or water (B). (A) Drawing of a molecu- 
lar model of an o helix in which the hydroxyl of the side chain 


ofa serine is shown acting as a donor of a hydrogen bond to the 
acyl oxygen of the amino acid either three or four positions 
ahead of it in the amino acid sequence.” The serine hydroxyl 
group can swing to complete a hydrogen bond to either acyl 
oxygen. Reprinted with permission from ref 26. Copyright 1984 
circles are positions of water molecules in the vicinity of the 
æ helix. Four of the waters occupy locations consistent with the 
formation of hydrogen bonds to the adjacent acyl oxygens, those 
tion to the hydrogen bonds those oxygens accept from the 
appropriate amido nitrogens. The donor in the side chain of 
Threonine 35 forms a hydrogen bond with the acyl oxygen of 
Phenylalanine 31, and one of the donors in the side chain of 
outer surface of the protein with its left side toward the interior 
and its right side toward the water. This drawing was produced 
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on Proline 25, Alanine 26, Alanine 29, and Arginine 33, in addi- 
Asparagine 34 forms a hydrogen-bond with the acyl oxygen of 
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Figure 6-7: Occupation of the second hydrogen-bond accep- 
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were a serine, its hydroxyl would take the place of one of 
the waters in the respective hydrogen bonds to acyl oxy- 
gens on Proline 25 or Alanine 26. Such intramolecular 
hydrogen bonds are quite frequently encountered in 
ahelices. In the molecular model of myoglobin, a pro- 
tein with a large amount of o helix, 6 out of the 11 serines 
and threonines in the protein form hydrogen bonds with 
the acyl oxygens on amino acids three or four positions 
ahead of them in the sequence of the protein.” 
Asparagine can also participate in such an intrahelical 
hydrogen bond (Asparagine 34 in Figure 6-7B). 

About 20% of the ahelices in crystallographic 
molecular models are kinked.” The most common cause 
of an abrupt kink in an g helix is a proline. For example, 
Proline 183 in the middle of an aw helix 30 amino acids 
long in citrate (Si)-synthase*’ causes the o helix to bend 
abruptly by 40°. The mean value for the angles of the 
abrupt kinks produced in o helices by prolines is 26°.” 
In proteins where such a kink is found naturally, the pre- 
sumption is that it serves the purpose of fitting the 
a helix properly into the overall structure. When a pro- 
line is inserted into an otherwise straight o helix by site- 
directed mutation, the chelix, if it tolerates the 
substitution, displays a kink with a much smaller angle, 
and the protein becomes significantly less stable.” 

There are also examples of local distortions in 
a helices that seem to be caused by the incompatibility of 
an undistorted o helix with the surrounding structure of 
the protein. If the helix is too short, a gap develops in 
which molecules of water or donors and acceptors from 
side chains occupy the acceptors and donors broken by 
opening the gap;**”* if the œ helix is too long, one or more 
of its amino acids is pushed out of the structure as an 
aneurysm or loop.” 
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At the amino-terminal end of an c.helix there are 
unoccupied nitrogen-hydrogen donors, and at the car- 
boxy-terminal end, unoccupied acyl oxygen acceptors 
(Figure 6-6). Because each peptide bond has two accep- 
tors for hydrogen bonding but only one donor and 
because the side chains of the amino acids also have an 
excess of acceptors over donors, a solution of protein 
contains more acceptors than donors of hydrogen 
bonds. Consequently, when a donor remains unoccu- 
pied in the native structure, there was a loss of one 
hydrogen bond from the solution upon folding. 
Therefore, it comes as no surprise that the donors at the 
amino-terminal end of an «helix are occupied, or 
capped,”*’“” but the acyl oxygens at the carboxy-termi- 
nal end are often capped as well. 
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About half of the time, the side chain of an amino 
acid such as asparagine, serine, threonine, or aspartate in 
the position immediately before the beginning of the 
ahelix, the N-cap position, provides one or more of 
the necessary acceptors to occupy the open donors at the 
amino-terminal end (Figure 6-58). When such an amino 
acid is replaced by site-directed mutation with one that 
is of the same size or smaller but that cannot provide an 
acceptor, the resulting protein is less stable*'** because a 
hydrogen bond is lost to the solution upon its folding that 
is not lost when the wild-type protein folds.” Proline 
often (10%) occurs at the first position in an o helix” 
because it does not have a nitrogen-hydrogen donor that 
requires capping. About a third of the time, the amino 
acid immediately after the end of an o helix is a glycine, 
which can readily (Figure 6-4B) adopt the necessary 
dihedral angles d and wy (+70° and +20°, respectively) 
that permit the amido nitrogen-hydrogen of the next 
amino acid to occupy the open acceptor of the first unoc- 
cupied acyl oxygen (that on the amino acid 98 in Figure 
6-6) at the carboxy-terminal end of the helix. 

The dipole moment of an isolated peptide bond is 
estimated to be 3.5 D,“ and the peptide bonds in an 
æ helix are held with their dipoles almost parallel to the 
axis (Figure 6-6) so that the positive poles point to the 
amino-terminal end of the o helix and the negative poles 
to the carboxy-terminal end. Such an arrangement of 
dipoles creates an electrostatic field of the respective 
polarity, the magnitude of which is 1 V at 0.3 nm and 
0.5 V at 0.5 nm from each end of the æ helix, if the a helix 
is greater than 10 amino acids long and located in a 
medium of relative permittivity equal to 2.“ These volt- 
ages would produce electrostatic potentials equal to 
about 100 and 50 kJ mol”, respectively, for a univalent 
ion. Although these electrostatic potentials are less than 
twice those that would be felt at the same distances from 
two adjacent, isolated peptide bonds, it is thought that 
the amplification produced by aligning the peptide 
bonds in an o helix is significant. 

Experimental observations equivocally consistent 
with this idea have been presented. For example, the 
upfield shift (+0.4 ppm) in the absorption in a nuclear 
magnetic resonance spectrum for the proton in a hydro- 
gen bond between an amide and an acyl oxygen in an 
æ helix relative to one in random meander has been 
attributed to the a-helical dipole,” but an unexplained 
downfield shift of the same magnitude is found in a 
B sheet. The location of a sulfate ion at the intersection of 
the amino-terminal ends of three œ helices in sulfate 
binding protein suggests that the positive ends of the 
dipoles of these o helices stabilize the anion,“ but the 
amino-terminal ends of these helices could simply be 
providing the properly oriented amido donors that 
occupy several of the many o lone pairs of electrons on 
the sulfate (2-28). The location of Glutamate 35 of 
lysozyme adjacent to the amino-terminal end of an 
a helix suggests that this arrangement would stabilize its 


anionic conjugate base," but it is usually argued that the 
acidity of Glutamate 35 must be weakened rather than 
strengthened so that it will be protonated when substrate 
binds to the enzyme. 

It is unfortunate that the original calculations of the 
magnitude of the electric field generated by an «helix 
assumed that it existed in a uniform dielectric with a rel- 
ative permittivity of 2 and no account of the relative per- 
mittivity of the medium surrounding it was taken. For 
example, electrostatic potentials of only 2.5 kJ mol! have 
been observed for univalent elementary charges posi- 
tioned at the amino-terminal ends of o helices in water 
(e = 78 at 25 °C),* but even these small potentials dis- 
appear when the ionic strength of the solution is 
increased. Later calculations“? have incorporated the con- 
tribution of the dielectric surrounding the o helix, and it 
was found that if the protein was approximated by a solid 
sphere of relative permittivity 3.5 in a solvent of relative 
permittivity 80 (water), even when the o helix was com- 
pletely within the sphere of low relative permittivity, the 
electric field around the «helix was dramatically less 
than the electric field in a uniform dielectric with a rela- 
tive permittivity of 3.5. Furthermore, if the ends of the 
æ helix were at the surface of the sphere, in contact with 
the solvent, the electric field decreased even further to 
negligible levels. This effect of the dielectric may explain 
why the apparent electrostatic potentials exerted on 
aspartates positioned by site-directed mutation at the 
amino-terminal ends of the two a helices on the surface 
of T4 lysozyme were only about -2 kJ mol.“ If the rela- 
tive permittivity of the interior of a protein is greater than 
3.5, the magnitude of the electric field would decrease 
accordingly in inverse proportion. Finally, the solution 
around a molecule of protein always contains electrolytes 
that would further diminish the electric field.“ For all of 
these reasons, electrostatic free energies of significant 
magnitude are probably not exerted by an o helix within 
a protein, although the possibility is often discussed. 

When an o helix traverses the surface of a protein as 
a continuous rod, its face directed toward the protein is 
hydrophobic and its face directed toward the solution is 
hydrophilic. The o helix in Figure 6-7B has such an ori- 
entation with the surface formed by Leucine 28, 
Tryptophan 30, Phenylalanine 31, and Threonine 35 
facing the protein and the opposite surface facing the 
solvent, as indicated by the locations for molecules of 
water. This asymmetry of hydropathy is sometimes 
reflected in the amino acid sequence of the protein and 
can be identified by constructing a helical wheel.’””' 
Around a circle, successive amino acids in the sequence 
are placed at 100° intervals (Figure 6-8). This represents 
the view down an chelix (Figure 6-46), much as a 
Newman projection represents the view down a 
carbon-carbon bond. Any asymmetry in the distribution 
of hydropathy is easily observed. If a segment of amino 
acid sequence in a polypeptide, when placed upon a hel- 
ical wheel, reveals such an asymmetric pattern of 
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Figure 6-8: Two segments of amino acid sequence displayed on helical wheels. (A) Sequence from Lysine 60 to Proline 77 (KKVADALT 
NAVAHVDDMP) in the «polypeptide of human hemoglobin. In the crystallographic molecular model of hemoglobin, this sequence is an 
o helix running across the surface of the protein. In the diagram, the amino terminus is the lysine at the 10:30 position and the sequence is 
read at 100° intervals. (B) Amphipathic helical sequence from Proline 455 to Lysine 472 (PD VKSAIEGVKYIAEHMK) from the a polypeptide 
of acetylcholine receptor. The lines in both panels divide the hydrophilic and hydrophobic surfaces of these two amphipathic o helices. 


hydropathy, as do those in Figure 6-8, this pattern is evi- 
dence that that segment is an cahelix in the folded 
polypeptide. Such œ helices are referred to as amphi- 
pathic œ helices. An amphipathic ahelix is an o helix 
that is enriched in hydrophobic side chains on one of its 
sides and enriched in hydrophilic side chains on the 
other. 

A single-stranded amphipathic a-helical peptide in 
a mixed solvent of trifluorethanol and water, unattached 
to a protein, displays an intrinsic curvature with the 
hydrophobic amino acids on the concave face and the 
hydrophilic on the convex.” In an o helix running across 
the surface of a protein, the same orientation of curva- 
ture is often observed.” Whether this curvature is due to 
the fact that such an o helix is amphipathic,” or to the 
fact that the acyl oxygens of its peptide bonds on the face 
exposed to the solvent form hydrogen bonds with 
water,” or to the fact that such curvature simply allows it 
to adhere more closely to the underlying structure, or to 
more than one of these reasons is unclear. 

It has been proposed that certain amino acids or 
short sequences of amino acids may impose upon a fold- 
ing polypeptide biases toward the formation of particu- 
lar secondary structures at locations where they reside in 
the native structure of a protein. One hears terms such as 
“helix-forming” or “helix-breaking” amino acids.” 
Originally, these distinctions were based on the observed 
preferences of homopolymers of the various amino acids 
to assume «& helices or sheets of $ structure or to remain 
structureless at various temperatures, ionic strengths, 
concentrations of cosolvents, and values of pH.” The 
propensities of the various amino acids to favor an a-hel- 
ical conformation have also been examined, either by 
placing each of them in turn in the center of an o helix in 
a native protein by site-directed mutation”” or by 
incorporating them in turn into a position in the center 


of a peptide that assumes an «-helical conformation in 
water © and then measuring the changes in stability to 
that protein or to that unsupported o helix that result. 

Although there are significant differences in the 
various scales that result from these measurements, all 
agree that, of the 20 amino acids, alanine has the greatest 
propensity to stabilize an a helix and glycine the least 
and that the differences in free energy of stabilization 
between these extremes are about 4 kJ mol. These pref- 
erences presumably explain how antifreeze peptide 3 
from the winter flounder, in which 23 of the 37 amino 
acids are alanines, can naturally assume the conforma- 
tion of along, unbroken, unsupported o helix,” but in an 
illustration of the unpredictability of the structure of pro- 
teins, all of the alanines in the alanine-rich regions of 
spider dragline silk are found in ß sheets.” 

The propensities of the 17 primary amino acids 
other than alanine, glycine, and proline to stabilize or 
destabilize an « helix are much less obvious. If the values 
for their helical propensities in eight different scales™ are 
averaged, the difference between the mean values for 
any two of them is rarely as large as the standard devia- 
tion of the value for either. Consequently, with the possi- 
ble exception of methionine and leucine (both at -2.8 + 
0.5 kJ mol) assigning the rest a value halfway between 
that of glycine (arbitrarily set at 0 kJ mol’) and that of 
alanine (-3.6 kJ mol!) would be as statistically significant 
as assigning them each an individual value. 

There are several other types of helical structures 
that occur rarely in crystallographic molecular models of 
native proteins. The polyproline helix, of which there is 
an example 13 aa long in benzoylformate decarboxy- 
lase,“ has dihedral angles d and y of -75° and 145°, 
respectively," which places it within the largest 
allowed region of the Ramachandran plot (Figure 6-4B). 
It is a much more extended structure than an o helix, 
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having a rise of 0.31 nm aa’ and 3.0 aa turn”, and it isa 
left-handed helix rather than a right-handed one. As their 
name suggests, polyproline helices in crystallographic 
molecular models of proteins usually contain a high fre- 
quency (25-70%) of proline.® Even though they contain 
no internal hydrogen bonds and usually occur in situa- 
tions where they are supported by surrounding struc- 
tures, there are sequences of amino acids in naturally 
occurring proteins that form unsupported polyproline 
helices.° The helix, of which there is an example 13 aa 
long in arachidonate 15-lipoxygenase,” is a wider, squat- 
ter version of the o helix in which the hydrogen bonds 
are between the acyl oxygen of amino acidi and the 
nitrogen-hydrogen bond of amino acid i+ 5 rather than 
amino acidi+ 4. 

The values for the dihedral angles @ and y found in 
ideal $ structure (d=-130°, w=+120°) lie within one of 
the two largest allowed regions of the Ramachandran 
plot @ and ® in Figure 6-4A). These dihedral angles 
place the hydrogen on the o carbon under the preceding 
acyl oxygen, O1 (Figure 6-3B), and under the next amido 
hydrogen, H2 (Figure 6-3D), respectively. This is the least 
hindered of all the conformations, and ß structure expe- 
riences no serious steric problems around its œ carbons. 
Because D structure is usually found in the most deeply 
buried regions of a protein, its polypeptide backbone 
usually displays the least thermal motion® even though 
its dihedral angles @ and y are the least sterically con- 
strained. Nevertheless, it is obvious from an examination 
of the polypeptide backbones of the proteins presented 
in Chapter 4 that, because of the size of this region, 
p structure is far more pliant and unpredictable than an 
a helix, and efforts to define regular patterns have been 
less informative than time spent looking at different crys- 
tallographic molecular models. The original ß-pleated 
sheets (Figure 4-16B,C) have turned out to be highly ide- 
alized. There are, however, several notable structural fea- 
tures of p structure. 

When a number of f strands do form a sheet, the 
sheet usually has a negative, left-handed twist to its sur- 
face (Figure 6-9).’° This is supposed to arise from the fact 
that the enclosure on the Ramachandran plot in which 
the dihedral angles @ and y for parallel and antiparallel 
p sheets reside has more open area for smaller values of 
dihedral angle @ and larger values of dihedral angle y 
beyond the values of these two dihedral angles that 
would give a flat sheet. Deviations tend to be biased 
toward these smaller values of dihedral angle d and larger 
values of dihedral angle y, and this bias creates the twist 
in the sheet.” It may simply be the case, however, that 
twisted ß sheets have surfaces against which other seg- 
ments of secondary structure, such as œ helices, can be 
more efficiently packed and that packing efficiency dic- 
tates the hand and magnitude of the twist because 
Bsheets almost as flat and regular as the idealized ver- 
sion (Figure 4-16B,C) have been observed.” 

Another feature of £ structure is the bulge.” In 


this arrangement one of the amino acids is skipped in the 
regular pattern of hydrogen bonding between two 
antiparallel strands. The hydrogen bond that would have 
incorporated the nitrogen-hydrogen bond of the skipped 
amide incorporates the nitrogen-hydrogen bond of the 
next amide instead. This causes the £ structure to bulge 
at the location of the skipped amino acid (Figure 6-10),” 
and the bulge is located where the strands change direc- 
tion. This change in direction can take two forms. If the 
p structure remains as a sheet in roughly the same plane, 
the D bulge puts a bend in the structure. A p bulge, how- 
ever, also can occur at a location where a large sheet of 
p structure folds over upon itself to form a sandwich of 
two opposed f sheets. 

As with whelices that contain gaps where a turn is 
pulled apart, a Bsheet can contain a gap between two 
strands. In such a gap, the donors and acceptors that 
have been pulled apart from each other are occupied by 
acceptors and donors on the side chains of their amino 
acids or by ordered molecules of water filling the gap.” 

Most f structure is buried in the middle of a pro- 
tein, but even in a small protein such as fatty-acid-bind- 
ing protein from Escherichia coli,” that is only a 
sandwich of two ßsheets, there is only a very weak 
amphipathic pattern of alternating hydrophobic and 
hydrophilic amino acids along the Pstrands. 
Consequently, Bstructure cannot be identified in an 
amino acid sequence. 

There are three cylindrical arrays formed from 
B structure: a barrel (Figure 6-11)” of 4-12 strands,” a 
Bhelix (Figure 6-12),° and a Bpropeller (Figure 
6-13)° with 6-8 blades HIT In a B barrel, the hydrogen 
bonds between the strands are perpendicular to the 
axis of the cylinder and perpendicular to its radius; in a 
p helix, they are parallel to the axis of the cylinder and 
perpendicular to its radius; and in a D propeller, they are 
parallel to the radius of the cylinder and perpendicular to 
its axis. Therefore, each of the three orthogonal axes of a 
cylinder is represented. 

Themostcommon type of f barrelhas eight D strands 
(Figure 6-11). Usually ß barrels are of eight strands or fewer 
so that the core can be tightly packed with side chains, but 
there is a ß barrel of 11 strands through the core of which 
runs an o helix. The £ strands in a ß barrel reside in a 
surface that can be approximated quite closely by a twisted 
hyperboloid.® A hyperboloid is an ellipsoidal cylinder that 
is narrowest at its center and gradually and continuously 
widens away from its center in both directions (notice the 
flare to the hyperboloid in Figure 6-11). In a ß barrel of 
eight strands, the strands are tilted””*® with respect to the 
axis of the hyperboloid by a mean angle of -34° to -47° 
(the mean angle of tilt in Figure 6-11 is-34°), but in ß bar- 
rels of less than eight strands, the angle of tilt gradually 
increases to -43° to -59° when there are only five.” As in 
a normal £ sheet (Figure 6-9), the sheet that forms the 
hyperboloid has a negative twist and the mean angles of 
twist between adjacent strands are between -21° and-30° 


Figure 6-9: Parallel ß-pleated sheet within the crystallographic 
molecular model (PDB filename 20HX; Bragg spacing > 0.18 nm) of 
alcohol dehydrogenase.” The 12-stranded £ sheet is composed of 
six parallel strands from each of the subunits of the dimer joined in 
an antiparallel orientation. The two identical series of numbers are 
those for the respective amino acid sequences of the two identical 
polypeptides comprising the protein. This drawing was produced 
with MolScript.°” 


in barrels of eight strands (the mean angle of twist in Figure 
6-11 is—25°), but this angle increases to between -28° and 
-44° in ß barrels of five strands. DO Barrels can be con- 
structed from parallel p strands of polypeptide of identi- 
calsequence,®”** from an antiparallel 8 sheet wrapped into 
a cylinder,®**** or from two identical sheets of parallel 
B strands arranged antiparallel to each other,” but the 
most common arrangement is parallel D strands of non- 
identical sequence. In such parallel p barrels the strands 
are often distributed around the barrel in the order in 
which they occur in the sequence of the polypeptide, and 
the carboxy-terminal end of one strand is connected to the 
amino-terminal end of the next by an o helix. Such £ bar- 
rels are designated (aß), where n is the number of 
B strands. 

The £ helix displayed in Figure 6-12 is one in which 
there are three £ sheets running up the tube at roughly 
60° angles to each other. This configuration seems to be 
the most common type, but there are p helices in which 
only two $ sheets run up the tube on opposite sides and 
the two sheets are flattened against each other.” 
Extrusions of random meander (amino acids 167-175 in 
Figure 6-12) are common features of p helices. There is 
also an example of a hybrid structure in which each of the 
Bstrands in one of the three ßsheets in a ß helix is 
replaced by an æ helix.” 

A third regular structure, in addition to a helices 
and D structure, universally encountered in the crystallo- 
graphic molecular models of proteins is the Drum. A 
Bturn is any structure that has a hydrogen bond between 
the acyl oxygen of the first amino acid in the turn and the 
amido nitrogen-hydrogen of the fourth amino acid in the 
turn (Figure 6-14). Usually such a hydrogen bond 


Figure 6-10: Five examples of D bulges from the crystallographic 
molecular models of various proteins, superposed upon them- 
selves to indicate their uniformity.” The skipped amido nitrogen- 
hydrogen in each case is in the center left of the structure. 
Hydrogen bonds are indicated by dotted lines. The five ß bulges are 
formed by Phenylalanine 41, Cysteine 42, and Leucine 33 of bovine 
chymotrypsin; Alanine 86, Lysine 87, and Lysine 107 of bovine 
chymotrypsin; Leucine 107, Serine 108, and Alanine 196 of con- 
canavalin A from Canavalia ensiformis; Isoleucine 90, Glutamine 
91, and Valine 120 of human carbonate dehydratase II; and 
Isoleucine 15, Lysine 16, and Lysine 24 of micrococcal nuclease 
from Staphylococcus aureus, where the first two amino acids listed 
flank the vacant amido nitrogen-hydrogen and the third provides 
the amido nitrogen-hydrogen and acyl oxygen from the other 
strand. Only the side chains of the three central amino acids are 
included in the figure. Reprinted with permission from ref 73. 
Copyright 1978 National Academy of Sciences. 
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models in such a way that each of their four dihedral 
angles, d, W, dn, and y, had the opposite sign, respec- 
tively (Table 6-1). These alternative conformations with 
opposite signs to their dihedral angles are called type IA 
and type IIA, respectively. In each of these two latter 
types, the polypeptide backbone is the mirror image of 
the polypeptide backbone in the corresponding f turns 
of type I and type Il in Figure 4-15D, but the amino acids 
must remain L-amino acids. 

Two other types of ßturns, of historical signifi- 
cance, were originally defined by Venkatachalam” as 
type UI and typelIIIA. The prototype for ßturns of 
type III, however, is the Zu helix!” originally proposed 
and included by Bragg, Kendrew, and 
Perutz in their catalogue of all helices that would have 
rotational angles for each amino acid that were integral 
quotients of 360°. This is a helix that has hydrogen bonds 
between the acyl oxygen of amino acid i and the nitro- 
gen-hydrogen bond of amino acid (i+ 3), the pattern that 
is the primary definition of the H turn. The smaller repeat 
produced by this shorter connection makes a 3}, helix 
narrower than an g helix. Short segments of Zu helix are 
occasionally seen in crystallographic molecular 
models,” but they are never more than five or six amino 
acids in length. Short segments of Zu helix five or six 
amino acids in length have distorted bond angles along 
the polypeptide, and this observation suggests that the 
strain in such a tight helix is considerable.'™ This steric 
effect would explain their rarity. Segments of synthetic 
poly(2-amino-2-methylpropionic acid), however, crys- 


Table 6-1: Frequency and Dihedral Angles of the Most Common Types of $ Turns 


dihedral angles? (deg) 


type frequency” (%) h Wo D 2 

I 41° -64 +8 -19+8 -90 +9 —2+11 
IA 6 52+3 41+4 87+6 -11+14 
Il 26 -61+6 132+1 82+12 3415 
TIA 6 63+5 -126+5 -80 +9 -11+10 


“These are the frequencies in which these types of $ turn occur in 59 crystallographic molecular models built from data sets gathered to Bragg spacing of <0.2 pm D "Mean 
and standard deviations of the dihedral angles for amino acids i+ 1 and i+ 2 in the £ turns from the crystallographic molecular models of lysozyme,” a-lytic protease,” 
deoxyribonuclease I,’ and penicillopepsin.” Values from crystallographic molecular models built from data sets gathered to even narrower Bragg spacing”! fall within 
these ranges. “Does not include segments judged to be 3; helix. If these had been included, the frequency of £ turns of type I would rise to 50%. 


tallize as 31) helices.” The most frequent location for a 
short segment of 3,, helix is at the end of an whelix. A 
turn of 39 helix in the middle of an œ helix can put an 
elbow into it. For example, a turn of Zu helix at Serine 143 
and Leucine 144 in the center of an chelix in 
deoxyribonuclease I causes an abrupt bend of 22°. 

A 3,9 helix of four amino acids with one hydrogen 
bond is similar to a $ turn of type I because it has mean 
values for its dihedral angles @ and y of -71° and -18°, 
respectively, but with wide ranges” that include those for 
p turns of type I, and it performs the same role as a p turn 
of type I. Almost all (96%) of the stretches of amino acids 
in crystallographic molecular models of proteins 
assigned as Zu helix” are four or less amino acids in 
length, so most instances of Zu helix could as easily be 
assigned as £ turns of typeI. Usually, however, they are 
not classified as such, and if not they are assigned as 
either 3], helix or B turns of type III, depending on the 
preferences of the crystallographer. For every segment of 
amino acids assigned as 3, )helix or Bturn of type III 
instead of £ turn of typeI, there are about 4.5 p turns of 
type DP so the confusion is not a major one. 

In crystallographic molecular models, p turns are 
designated both by the existence of a hydrogen bond 
between the acyl oxygen on the first amino acid and the 
amido nitrogen-hydrogen on the fourth amino acid and 
by the proximity of the a carbons of the first and fourth 
amino acids. In general, these two acarbons are 
0.5-0.6 nm apart.! Those configurations designated by 
these rules as £ turns can be grouped into the categories 
proposed by Venkatachalam (Table 6-1) as well as sev- 
eral other minor categories.” It was only after refined 
crystallographic molecular models became available 
that the clear tendency of these structures to fall into 
specific categories became apparent, because in unre- 
fined structures the orientation of the polypeptide back- 
bone could not be defined with sufficient accuracy. 

BTurns of type I are the most common (Table 6-1). 
The dihedral angles at both of the o carbons in $ turns of 
type I fall in the enclosure on the Ramachandran plot 
between dihedral angles of ¢ = -50° and -130° and y= 
20° and -30° (Figure 6-4B). This is the region in which 
the two successive amides are squeezed against each 
other (Figure 6-3F). Presumably the return on the invest- 
ment of energy necessary to squeeze them against each 
other and widen the tetrahedral bond at the & carbon is 
the efficient reversal of the direction of the polypeptide. 
It is probably the case that £ turns of type I and segments 
of 3,9 helix” account for most of the amino acids that fall 
in this well-populated region of the Ramachandran plot. 

The values for the dihedral angles 6, and ve for 
Bturns of type II fall in the largest enclosure on the 
Ramachandran plot, but those for the dihedral angles d 
and y; fall in a region that can be occupied only by an 
amino acid without a ß carbon (Table 6-1, Figure 6-4A), 
so only glycine should occupy the third position in a 
B turn of type II. Although this is usually the case (74%),”° 
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there are exceptions, about half of which are asparagines 
such as Asparagine 69 in a-Iytic endopeptidase.” 

The mirror image conformations, in which the 
polypeptide backbone mirrors the respective basic 
p turn but the amino acids remain, of necessity, L-amino 
acids, are rare. The third amino acid in a B turn of type IA 
and the second amino acid in a Bturn of type IIA should 
be a glycine, but again a few exceptions have been 
observed, such as Cysteine 170 in deoxyribonuclease 1.® 
It has been noted!” that when an antiparallel £ hair- 
pin reverses itself in the tightest possible p turn, where 
the hydrogen bond of the f turn is also the last hydrogen 
bond between the tines of the hairpin, the £ turn is usu- 
ally type IA or type IIA. 

Several minor classes of D turn have been defined. 
ß Turns of types VIA and VIB with dihedral angles d and 
y of (-60° + 30°, 120° + 30°, -90° + 30°, 0° + 30°) and 
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(-120° + 30°, 120° + 30°, -90° + 30°, 0° + 30°), respec- 
tively, together account for less than 3% of all $ turns. 
ß Turns of type VIII, however, with dihedral angles ¢ and 
y of (-60° + 30°, -30° + 30°, -120° + 30°, 120° + 30°)” 
are somewhat more common (12%), but the dihedral 
angles ọ and y required to encompass this class are 


much less tightly clustered than those for types I and II. "°° 

Aside from the requirement that glycine occupy 
certain positions of a f turn for steric reasons, there are 
some clear preferences” for other amino acids. 
Because f turns are almost always at the surface of a pro- 
tein, they contain hydrophilic amino acids more fre- 
quently than hydrophobic amino acids. About 25% of all 
Bturns have proline at their second position (Figure 
6-55). About 30% of all 6 turns of type I have either aspar- 
tate, asparagine, or cysteine at their third position. Each 
of these three amino acids has a hydrogen-bond accep- 
tor that is properly situated to accept a hydrogen bond 
from the amido nitrogen-hydrogen of the amino acid in 
the next position just beyond the ß turn.” 

A yturn is another type of turn that occurs rarely in 
crystallographic molecular models of proteins.'!"''? A 
y turn has a hydrogen bond between the nitrogen-hydro- 
gen of the amide of the first amino acid in the turn and 
the acyl oxygen of the third amino acid in the turn, caus- 
ing the dihedral angles ¢ and y of the central amino acid 
of the three to be around 80° and -60°, respectively, 
which is presumably why such structures are so rare 
(Figure 6-4B). 

In every protein there are also segments of polypep- 
tide that do not assume the configuration of an o helix, 
p structure, or p turn. These segments of random mean- 
der pass about the protein as would ano helix or a strand 
of p structure. They are usually found on the surface of 
the molecule, and occasionally one of them will loop out 
a significant distance from the core of the structure. 
Although there is no regular pattern to this configuration, 
each of the amino acids in a segment of random mean- 
der usually assumes a fixed position in the crystallo- 
graphic molecular model and has specific values for its 
dihedral angles d and vy. These values, however, are still 
confined to the minima in the Ramachandran plot 
because these minima are defined by inescapable local 
steric effects. This places the angles for random meander 
within the same regions defined by the clusters in Figure 
6-4B. The distinction between «helix, p structure, and 
random meander cannot be made by comparing single 
values of the dihedral angles @ and y but only by identi- 
fying repeating patterns that extend over several amino 
acids or several strands of polypeptide. In random mean- 
der, no such pattern is evident. 

Almost all of the regular structures in which the 
polypeptide participates are given their regularity by 
hydrogen bonds between the amido nitrogen-hydro- 
gens and acyl oxygens of the backbone. These hydrogen 
bonds can be readily identified in crystallographic 
molecular models, and it can be safely assumed that they 
exist. The bond length for such unambiguous hydrogen 
bonds, expressed as the distance between nitrogen and 
oxygen, is 0.29 + 0.015 nm.2°979%8118 

The angular dependence of these hydrogen bonds 
can be expressed either in reference to the nitrogen- 
hydrogen bond of the one amide (Figure 6-15A,B)””” or 


the carbon-oxygen bond of the other amide (Figure 
6-15C-G).' As expected, the angles around the amido 
nitrogen-hydrogen of the donor are much more con- 
fined than those around the lone pairs on the acyl oxygen 
of the acceptor. A deviation from 0° of either of the 
angles around the nitrogen-hydrogen bond (Figure 
6-15A) places the hydrogen off the line of centers 
between the nitrogen and the oxygen and bends the 
bond. 

The angles of the hydrogen bond relative to the 
carbon-oxygen bond of the acceptor (Figure 6-15C-G) 
vary over a greater latitude. In keeping with the rigidity of 
a helices and flexibility of ß structure, the angles around 
the acyl oxygens in £ structure are much more variable 
(Figure 6-15F,G) than those around the acyl oxygens in 
a helices (Figure 6-15D). In none of these regular struc- 
tures, however, is there any tendency for the values of the 
angles around the acyl oxygen to cluster at B= 0° and 
y=+60°, the positions at which the nitrogen-hydrogen 
bond would point directly at one or the other of the lone 
pairs on the acyl oxygen (Figure 5-10). It has, however, 
already been noted that, even in crystallographic molec- 
ular models of small, unconstrained molecules, the pref- 
erence for these angles is not remarkable (Figure 5-11), 
and there seems to be little energetic cost in pivoting the 
donor over the surface of the acyl oxygen distal to the 
acyl carbon (Figure 5-10D). Therefore, in regular struc- 
tures such as an «helix or ßstructure, it is the steric 
requirements of these structures themselves that easily 
take precedence. 

Even in refined molecular models from data sets of 
narrow Bragg spacing, the identification of a hydrogen 
bond is subjective. It is often based on the fact that two 
heteroatoms are simply within a certain distance of each 
other. The dimensions of unquestionable hydrogen 
bonds in regular structures, however, suggest a more 
objective definition of a hydrogen bond.” It has been 
proposed that a hydrogen bond is an arrangement in 
which the heteroatoms of the donor and the acceptor are 
less than 0.34 nm from each other; the angle A-H-B, 
where A is the donor and B the acceptor, is between 150° 
and 180° (Figure 6-15B); and the distance between the 
theoretical position of the hydrogen and the heteroatom 
of the acceptor is less than 0.24 nm. When these defini- 
tions are applied to the rather featureless distribution of 
distances between nitrogens and oxygens in a crystallo- 
graphic molecular model, that distribution can be 
divided into hydrogen bonds and non-hydrogen bonds 
(Figure 6-16). 

The regular structures assumed by the polypeptide 
serve the purpose of maintaining the total concentra- 
tion of hydrogen bonds in the solution. It seems highly 
unlikely that a protein could fold without withdrawing 
considerable numbers of its peptide bonds from contact 
with water. Were the donors and acceptors on these 
peptide bonds withdrawn from the solvent without sub- 
sequently participating in hydrogen bonds in the inte- 
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Figure 6-14: Turn of typel 
from the crystallographic molec- 
Arginine 17 and the amido nitro- 
gen-hydrogen of Glycine 20 in 
the amino acid sequence of the 
protein. Leucine 18 and Proline 
19 occupy the two central posi- 
tions. The molecular model was 
from a data set gathered to such 
narrow Bragg spacing and atsuch 
a low temperature (130 K) that 
the electron density for hydrogen 
atoms could be readily discerned, 
and those on the amides are 
included in the drawing. The 
C’-endo pucker of Proline 19 was 
also clearly defined. This drawing 
was produced with MolScript.°” 


ular model 


contains 


rior of the protein, there would be an unavoidable loss 
of standard enthalpy (Figure 5-18). This loss is cancelled 
when they find new partners (Table 5-2). œ Helices and 
p structure are simply efficient mechanisms for accom- 
plishing this energetic imperative. 


Suggested Reading 
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Problem 6-1: Build a space-filling model of the structure 
displayed in Figure 6-2 with a methyl group at CP. 
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Figure 6-15: Bond angles for the hydrogen bonds in regular structures 
formed by a folded polypeptide. (A) Bond angles at the amido nitrogen- 
hydrogen. Two angles are defined, the angle y’ within the plane of the 
amide and the angle $’ out of the plane of the amide.” When y’ is 0°, the 
acyl oxygen is in the plane that is normal to the plane of the amide and 
that contains the nitrogen-hydrogen bond. When ß’ is 0°, the acyl oxygen 
is in the plane of the amide. (B) Distribution of these angles. The plot is for 
all of these bond angles for the hydrogen bonds between the peptide 
bonds in the crystallographic molecular model of a-lytic endopeptidase.” 
Symbols are (O) structure, (x) ahelix, and (*) random meander. 
Reprinted with permission from ref 97. Copyright 1985 Academic Press. 
(C) Bond angles at the carbon-oxygen bond of the amide. Two angles are 
defined, the angle y within the plane of the amide and the angle f out of 
the plane of the amide. Angles are defined relative to the axis of the 
carbon-oxygen bond and the plane of the amide. (D-G) Distribution of 
these angles. These angles at each hydrogen bond involving the peptide 
bonds in the crystallographic molecular models of 15 proteins!’ are 
plotted for hydrogen bonds in whelices (D), £ turns (E), parallel 6 struc- 
ture (F), and antiparallel 8 structure (G). Each mark is for the angles Band 
y of one of the hydrogen bonds included in the set. Reprinted with per- 
mission from ref 113. Copyright 1984 Pergamon Press. 
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Figure 6-16: Histogram of all of the distances (nanometers) 
between the centers of nitrogen atoms and oxygen atoms in direct 
contact with each other in the crystallographic molecular model of 
human lysozyme.” The number of pairs of nitrogens and oxygens 
that are a given distance apart in the molecular model is plotted as 
a function of those distances. If a hydrogen bond is defined as a 
nitrogen and an oxygen less than 0.34 nm apart, hydrogen and 
oxygen less than 0.24 nm apart, and the angle N HO between 
150° and 180°, the histogram can be divided into nitrogen-oxygen 
contacts that are hydrogen bonds and contacts that are not hydro- 
gen bonds. Reprinted with permission from ref 22. Copyright 1981 
Academic Press. 


(A) Take this molecular model of the polypeptide 
backbone, and adjust it so that ọ = 60° and y= 
120°. What atoms are colliding? 


(B) Adjust it to d= 60° and w=-60°. What atoms are 


colliding? 

(C) Adjust it to d= 180° and w= 60°. What atoms are 
colliding? 

(D) Adjust it to d=-60° and y= 60°. What atoms are 
colliding? 


Use the numbering system of Figure 6-2 for your 
answers. 


Stereochemistry of the Side Chains 


It was pointed out in a discussion of the crystallographic 
molecular model of chymotrypsin, which was one of the 
first refined crystallographic molecular models,” that 
certain rotational conformations of the side chains of 
the amino acids seemed to be preferred. As more highly 
refined crystallographic molecular models of proteins 
built from data sets of narrower Bragg spacing have 
become available, it has become clear that most of the 
side chains of the amino acids in the interior of these 
models assume only one of the several rotational confor- 
mations available to them,''*'!? a remarkable fact that 
illustrates the confinement exercised by the efficient 
packing in the interior of a protein. 

In maps of electron density calculated from data 
sets to narrow Bragg spacing, side chains displaying 
alternative conformations (Figure 4-25) are infrequent 


(12 of 478 side chains in human glutathione reductase;!!® 
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35 of 584 side chains in human £, hemoglobin'!”"'*) and 
most ofthose are side chains found at or near the surface 
of the molecule of protein, such as lysines, serines, thre- 
onines, glutamates, aspartates, and asparagines.’+116118 
The valine of Figure 4-25 in the interior of 
ribonuclease T; is an example of an exception to the pref- 
erence of buried side chains for only one conforma- 
tion.’ When side chains do assume two alternative 
conformations, each of them is usually at one of the 
normal minima of rotational energy TT as are the unique, 
fixed conformations of most of the side chains. 

Consequently, most of the observed rotational con- 
formations are staggered rather than eclipsed, a fact that 
is reflected in the strong tendency (Figure 6-17)” for 
dihedral angles along carbon-carbon bonds between 
saturated carbons and other atoms that are hybridized 
sp? to assume values near 60°, 180°, and 300° (-60°). In 
extensive tabulations'“”'” of the dihedral angles for all of 
the side chains in sets of refined crystallographic molec- 
ular models from data sets of narrow Bragg spacing, most 
of the values for the dihedral angles of carbon-carbon 
bonds connecting atoms that are hybridized sp’ are clus- 
tered within 10° of one of these three values. 

It has been pointed out, however, that a significant 
fraction of the amino acid side chains have at least one 
dihedral angle that falls more than 20° (5-30% depend- 
ing on the side chain)” or even more than 30° (1-19% 
depending on the side chain)!” away from the mean. If 
these are real deviations, conformations of side chains 
exist in which substituents are partially or fully 
eclipsed,'” situations in which considerable steric strain 
(15-40 kJ mol”) must be accommodated. Most of these 
unexpected and sterically strained dihedral angles in 
these crystallographic molecular models, however, are 
probably artifactual,” arising from unresolved alterna- 
tive conformations,’ the inaccuracy of the crystallo- 
graphic molecular model, incorrect insertion into the 
map of electron density, and errors accumulated during 
refinement. Again it must be remembered that a crystal- 
lographic molecular model is not the actual structure of 
the molecule of protein. Nevertheless, some of these 
deviations may reflect the actual adjustments of some of 
the side chains to the impossibility of packing as compli- 
cated a molecule as a polypeptide into a compact globu- 
lar structure. 

The dihedral angle along the bond between Ca and 
CB in an amino acid is designated 7. It is the dihedral 
angle between the bond to the amido nitrogen, the most 
massive of the three atoms around Ca, and the bond to 
the atom attached to Cf that has the highest priority in 
the Cahn-Ingold-Prelog system (Figure 6-18). The sign 
of dihedral angle x, is determined by the right-hand 
rule. The stereochemistry about this bond between Ca 
and Cf is dominated by the polypeptide rather than the 
rest of the side chain. 

Valine is the logical place to begin the discussion of 
this stereochemistry because its two methyl groups can 
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Figure 6-17: Histograms and scatter plots of the distributions of 
the values for the dihedral angles yı and y, for the first two 
carbon-carbon bonds of the side chains of amino acids in the crys- 
tallographic molecular models of five proteins: penicillopepsin, 
streptogrisin A from Streptomyces griseus, streptogrisin B from 
S. griseus, the third domain of the ovomucoid inhibitor, and a-lytic 
endopeptidase from Lysobacter enzymogenes.” The abbreviation of 
each side chain appears in the upper left-hand corner of the panel. 
Serine, threonine, and valine had no observable dihedral angles %9, 
so in these instances frequency is plotted as a function of the only 
value of the dihedral angle yı. Leucine, isoleucine, asparagine, and 
glutamine had observable dihedral angles yı and %;, and in these 
cases each mark (Y for leucine, + for isoleucine, Y for asparagine, 
and + for glutamine) represents the value of these two angles for 
one of these side chains in these molecular models. Because of 
symmetry, the values for x, for the aromatic amino acids tyrosine, 
phenylalanine, and tryptophan (listed together as aromatics) fall 
only between 0° and 180°. Reprinted with permission from ref 98. 
Copyright 1983 Academic Press. 


assess the steric bulk of the three substituents on the 
a carbon because in each of the three staggered confor- 
mations (two of which are displayed in Figure 6-18), one 
of these three substituents must reside between the two 
methyl groups, a most hindered location. Because the 
smallest functional group should occupy this position 
most frequently, the distribution of the dihedral angles 7, 
of the valines in molecular models (Figure 6-17) states 
that the hydrogen on the o carbon is smaller (73% of x, 
are within 30° of 175°)* than the nitrogen of the preced- 
ing amide (20% are within 30° of -64°), which is smaller 


* Values of dihedral angles x, designated as within 30° of the angle 
at the maximum of the distribution are from the tabulation derived 
from an analysis of 240 crystallographic molecular models built 
from data sets all to Bragg spacing less than or equal to 0.17 nm.” 


180 
Aı 


Aromatics 


60 


than the acyl carbon of the following amide (6% are within 
30° of 63°). This behavior is completely consistent with 
the assessment of steric bulk based on preferences of var- 
ious substituents on cyclohexane for equatorial over axial 
locations. The increase in free energy!” for placing an 
acetoxy group in an axial location rather than an equato- 
rial location is 2.9 kJ mol", but the increase in free energy 
for placing a methoxycarbonyl group in an axial location 
rather than an equatorial location is 5.4 kJ mol”. 

Isoleucine (Figure 6-18) reinforces these prefer- 
ences by showing a similar distribution!” of analogous 
stereochemical conformations (76% within 30° of -64°, 
14% within 30° of 61°, and 10% within 30° of -173°, 
respectively). It has been suggested that isoleucine (+ in 
Figure 6-17) has different preferences from leucine (Y in 
Figure 6-17) for the dihedral angles yı and x, because 
these two geometric isomers should be able to satisfy in 
turn different steric requirements.” 

Threonine is isosteric with valine, but the designa- 
tion of the dihedral angle y; of threonine is 240° out of 
phase with that of the dihedral angle y; of valine because 
of the precedence of the (S)-oxygen over the (R)-methyl 
group (Figure 6-18). The conformation of threonine (7; 
within 30° of 59°, 49% of all threonines) in which the two 
substituents on Cf surround the nitrogen of the preced- 
ing amide (Figure 6-18) is about 4 times more frequent 
than the analogous conformation of valine (yı within 30° 
of -64°, 20% of all valines) relative to the respective con- 
formations (43% and 73%) in which hydrogen is sur- 
rounded. The most likely explanation for this difference 
is the fact that a hydroxyl group is significantly smaller 
than a methyl group. Another possibility, however, is that 
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Figure 6-18: Definition of the dihedral angley, for the 
carbon-carbon bond between the o carbon and the £ carbon of an 
amino acid in a polypeptide. All dihedral angles follow the right- 
hand rule. For valine, angle yı is the dihedral angle between the 
carbon-nitrogen bond and the bond to the pro-R methyl group. For 
isoleucine and threonine, angle yı is the dihedral angle between 
the carbon-nitrogen bond and the bond to the substituent of 
higher priority, namely, the ethyl group or the hydroxyl group, 
respectively. For all other relevant amino acids, the dihedral 
angle yı is the angle between the carbon-nitrogen bond and the 
bond to the remainder of the side chain. 


a dihedral angle x, within 30° of 59° places the hydroxyl 
of threonine in the proper position to form either a 
hydrogen-bonded ring with its own acyl oxygen or a 
hydrogen-bonded ring with the acyl oxygen of the amino 
acid that precedes it in the sequence.'“® Such hydrogen- 
bonded rings would explain why serines show such a 
high percentage of dihedral angles yı within 30° of 64° 
(48%), in contrast to all other amino acids with only one 
substituent on the D carbon, which have a low percent- 
age of dihedral angles z near 60° (11% of all side chains 
with one and only one substituent on C).'2°!21176 

The amino acids, other than serine, with one and 
only one substituent on Cf have a preference (55%) !2°17 
for dihedral angles x, of -65° + 10°.!”° This preference® is 
understandable because an angle yı of -65° places the 
single substituent on the p carbon between the two least 
bulky substituents around the a carbon (Figure 6-18). 
The other two maxima occur at-177° + 10° and 66° 8°. 
The bias in the direction of the œ hydrogen is reflected in 
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the values of -65° and -177°. Of this group of side 
chains, those with a methylene at the y position have 
dihedral angles yı within 20° of 66°,'*' which would put 
the methylene between the two bulkiest substituents 
(Figure 6-18), only 7% of the time, but the side chains 
with an aromatic ring at the yposition, which is less bulky 
than a methylene, have dihedral angles 7, within 20° of 
66° 16% of the time.’ 

Because the preference for a certain value for the 
dihedral angle yı usually depends on a choice among 
three staggered conformers (Figure 6-18), the energies of 
which differ by only a few kilojoules mole”, it comes as 
no surprise that this choice depends on the structure of 
the immediate surroundings. In particular, the dihedral 
angles y and ¢ of the amino acid can affect the choice of 
its dihedral angle yı. As might be expected, valine displays 
the most dramatic effects of the backbone on the dihe- 
dral angle yı. For valines within either ahelices or 
p sheets, values of dihedral angle ymore positive than the 
respective ideal values cause y; to switch from a value of 
exclusively 175° +8° to a value of exclusively 64° +7° 1? 

The dihedral angle x, is assigned to the bond 
between Cf and Cyin an amino acid. In linear amino 
acids such as glutamine, glutamic acid, lysine, arginine, 
and methionine, the majority (70%) of the dihedral 
angles 75 


R 
H H 
X, 
H H 
Hay 
H 
O 
6-4 


are clustered’ at 176° + 11°, which is the angle expected 
for a trans conformation at this carbon-carbon 
bond.!”®! The remainder of the dihedral angles y, for 
these amino acids are split equally between the two 
gauche conformations at dihedral angles x, near 60° and 
-60°. 

For the aromatic amino acids phenylalanine, tyro- 
sine, histidine, and tryptophan (Figure 6-17), the confor- 
mations with dihedral angles y) within 30° of 90° or 
-89°, where the plane of the ring is approximately per- 
pendicular to the bond between the «carbon and the 
Bcarbon 
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are preferred (71% of these side chains).'*' This orienta- 
tion (Figure 6-21) places the polypeptide most distant 
from the two ortho substituents of the rings and also 
avoids eclipse. A significant fraction (29%) of these side 
chains, however, have values for x, outside of these 
ranges. The aromatic rings are large, bulky substituents 
and each of these outliers is pushed out of the ideal range 
by unavoidable steric clashes with atoms from the back- 
bone or other side chains.'*' In spite of their bulk, how- 
ever, the symmetric rings of tyrosine and phenylalanine 
have been observed to flip over slowly and continu- 
ous 

It has also been observed in refined maps of neu- 
tron scattering density from crystallography by neutron 
diffraction that the hydrogens on all methyl groups are 
staggered.'” Although this seems to be the expected 
result because methyl groups in proteins should be free 
to rotate and assume freely a staggered conformation, 
there are indications that packing in the interior of a pro- 
tein is so tight that even methyl groups are confined 17 

The dihedral angles y, for the hydroxyl groups of 
serines and threonines 
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although they define the position only of a hydrogen, can 
also be observed by neutron diffraction.’ There is a 
strong tendency for the hydroxyl to be staggered (y, near 
60°, 180°, and -60°) with the trans conformation (%2 
near 180°) slightly preferred over either gauche confor- 
mation. The location of an acceptor forming a hydrogen 
bond with the proton often seems to dictate the dihedral 
angle assumed by the hydroxyl. Because of conjugation, 
the oxygen-hydrogen bond of the hydroxyl of tyrosine is 
within the plane of the ring (2-23). 

The distribution of the values of the dihedral 
angle x; for asparagine 


NH2 


has two maxima at -21° and +32° (82% within 30° of 
these two maxima) that define two respective 
classes,” the membership of which is determined by 


the value of the dihedral angle y; for that asparagine, the 
type of secondary structure to which it belongs, and the 
type of hydrogen bond formed between it and the back- 
bone. Asparagine 34 in Figure 6-7B serves as an example 
of one of these choices. Aspartate shows the same pref- 
erence for values of the dihedral angle y). The majority 
(82%) of the dihedral angles x, for aspartates are within 
60° of 0°.'? 

Only glutamine, glutamate, methionine, lysine, and 
arginine have carbon-carbon bonds with dihedral 
angles 73. Both glutamine and glutamate show the same 
preferences for dihedral angles y; near 0° that are shown 
by asparagine and aspartate for the analogous dihedral 
angle y2.” The trans conformation with dihedral 
angle x; within 30° of 180° is the preferred (66%) confor- 
mation for lysine, as expected.” 

Methionines are usually buried and confined to 
one or two overall conformations in crystallographic 
molecular models, so the dihedral angles yı (Ca-Cf), %2 
(CB-Cy), and x; (CYS) are usually fixed. The normal pref- 
erences for dihedral angles yı (59% within 30° of -67°) 
and % (55% within 30° of 178°)'” are observed, but the 
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has a significantly higher frequency for the two gauche 
conformers (39% within 30° of -72° and 32% within 30° 
of 75°).!2*!3! It has been pointed out that because the two 
carbon-sulfur bonds (0.18 nm) are longer than two 
carbon-carbon bonds (0.15nm), the steric clashes 
within methionine in such gauche conformations should 
be less severe and the dihedral angles x, should be less 
confined.'™* It seems that this unexpected preference for 
the gauche conformations arises from the fact that when 
methionine assumes this conformation, it is more com- 
pact. 

Because the amido nitrogen is planar, it occupies a 
position in the puckered cyclopentyl ring of a proline 
(Equation 6-1) at which eclipse would occur if it were 
occupied by a methylene. As a result, only the C’-exo and 
C’-endo conformations of proline 
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can be significantly populated," but it is difficult to dis- 
tinguish crystallographically between these two confor- 


mations even with a data set gathered to narrow Bragg 
spacing (< 0.17 nm). Nevertheless, decisions can often be 
made either directly (Figure 6-14) or indirectly’! as to 
the proper conformation, and sometimes a map of elec- 
tron density shows that both conformations are present 
in equilibrium with each other 

Cystine is an amino acid under peculiar steric con- 
straints (Figure 6-19).'””'® The distribution of the two 
dihedral angles y, of cystine’®'”! shows the same order 
and frequencies of preferences (56% within 20° of -65°, 
24% within 20° of -175°, and 12% within 20° of 64°)!” 
as those of any other amino acid with only one uncom- 
plicated substituent on the ß carbon. The disulfide itself, 
because it is a dithioperoxide, is electronically required 
to have a dihedral angle y; along the sulfur-sulfur bond 
similar to the dihedral angle of hydrogen peroxide, which 
is 94° or -94°. If the dihedral angle y; in a cystine were 
exactly 90° or -90°, the four lone pairs, two on each 
sulfur, would be as far from being parallel to each other 
as is possible, and this orientation would be the most 
stable electronically:'” 


The angles observed for the dihedral angles of cystines in 
crystallographic molecular models are 97° + 15° and 
-86° + 11° (indistinguishable from +90° and -90°), with 
little preference for the positive over the negative." 
There are instances in which these two discrete, alterna- 
tive conformations are both populated significantly by 
the same cystine (Figure 6—40B).’“’'” The dihedral 
angle x; of 97° or -86° in a cystine is peculiar enough to 
attract attention (Figure 6-19). 

The bonds between the ß carbons and the sulfurs in 
a cystine might be expected to have a preference for 
dihedral angles %2 
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equal to 180° as with most other amino acids, but in 
crystallographic molecular models, values around broad 
maxima of 60° and -60° (Figure 6-19) are heavily pre- 
ferred." Each of the two bonds between the £ car- 
bons and the sulfurs should be fairly long and not 
severely confined by the adjacent sulfur or the lone pairs 
on the immediate sulfur itself, and they are probably the 
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Figure 6-19: A pair of cystines, 
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most compliant of the bonds in the cystine. Because a 
cystine connects two strands of polypeptide engaged in 
many other interactions, it probably sustains consider- 
able torque. Presumably it is the dihedral angles x; that 
must accommodate this torque. Because of the elec- 
tronic requirements, the dihedral angle of the disulfide 
itself must always be near 97° or -86°, and it is only the 
two dihedral angles y that can adjust to allow the entire 
cystine to fit the distance required to be spanned 
between the two otherwise fixed a carbons of a cystine 
in a protein. For example, in Figure 6-19, if the two 
a helices connected by the two cystines had to move far- 
ther apart from each other or closer together, their rela- 
tive movement could be accommodated by increasing 
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or decreasing, respectively, the dihedral angles y) of 
Cysteines 217 and 224 or Cysteines 233 and 240 or all of 
them together. 

If both of the dihedral angles y) were 180°, the 
angle preferred by other amino acids with a single sub- 
stituent on the $ carbon, then the two a carbons in a cys- 
tine would be about 0.9nm apart, a rather distant 
connection. Because the distances between the two 
a carbons of cystines in native proteins fall between 0.45 
and 0.7 nm,’ the dihedral angles x, assume many other 
values, and rarely 180°. 

So far, the dihedral angles y have been considered 
independently. Usually, however, the individual 
rotamers of a side chain are tabulated.'*""”* A rotamer is 
a rotational conformation of a side chain in which each 
carbon-carbon bond assumes a dihedral angle y within a 
particular range. The range is within a certain number of 
degrees, for example, within 20°! or within 30°,!” of 
one of the maxima for the distribution, which are usually 
close to the staggered dihedral angles of 60° (gauche’), 
180° (trans), or -60° (gauche). For example there are 
nine rotamers of isoleucine, which are, in order of their 
frequency, gt, gg, g't, tt, tg’, gg", gg, g'g, and tg. 
Such tabulations of rotamers emphasize the dependence 
of one dihedral angle on the adjacent dihedral angles. 
There is, however, no agreement as to the statistical 
method that should be used to determine rotamers, their 
variances, or their distributions. 

All of the stereochemical observations discussed so 
far are either consistent with the behavior of small mole- 
cules or otherwise make sense. Some of this agreement is 
probably illusory. During refinement, constraints on 
dihedral angles are imposed either advertently or inad- 
vertently, and the fact that they are near ideal values in 
the final crystallographic molecular model may not 
reflect reality. Careful corrections using properly calcu- 
lated omit maps should eliminate this bias, and crystal- 
lographic molecular models derived from data sets 
gathered to narrow Bragg spacing, for which few con- 
straints need to be imposed during refinement, can avoid 
this problem entirely. 

If there are only a few conformations that are pre- 
ferred for each side chain, then there is far less flexibility 
involved in the folding of a protein than there seems to 
be at first glance. Conformations of the folded protein 
that demand dihedral angles to assume values other than 
those of lowest energy, which are normally the confor- 
mations most heavily populated in the unfolded 
polypeptide, require that extra energy be spent to occupy 
those conformations. It turns out that there is not much 
extra energy to go around. 
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Problem 6-2: Turn the alanine into a valine in your 
space-filling molecular model from Problem 6-1 by 
replacing two of the hydrogens on the ßcarbon with 
methyl groups. Rotate around the appropriate bonds 
until the dihedral angles ọ and y have the mean values 
for an amino acid in parallel ß structure. 


(A) What atoms run into the two methyl groups on 
the side chain of the valine as rotation occurs 
around the bond between the oe and £$ carbons? 
Use the numbering system of Figure 6-2. 


(B) What is the value for the dihedral angle x, that has 
the most sterically favorable disposition of the 
side chain in D structure? 


Rotate around the appropriate bonds until the dihedral 
angles d and win your model have the mean values for a 
right-handed o helix. 


(C) What atoms run into the two methyl groups on 
the side chain of the valine as you rotate around 
the bond between the a and f carbons? Again, use 
the numbering system of Figure 6-2. 


(D) What is the value for the dihedral angle x, that has 
the most sterically favorable disposition of the 
side chain in a right-handed o helix? 


The theoretical values for the dihedral angles ¢ and y for 
a left-handed «helix should be 65° and 40°, respec- 
tively. Rotate around the appropriate bonds until the 
dihedral angles 6 and yin your model have these values. 


(E) What atoms run into the two methyl groups on 
the side chain of the valine as you rotate around 
the bond between the o and £ carbons? 


(F) What is the value for the dihedral angle x, that has 
the most sterically favorable disposition of the 
valine side chain in a left-handed a helix? 


(G) On the basis of these observations, why is a left- 
handed o helix unstable relative to a right-handed 
æ helix? 


Problem 6-3: Draw Newman projections to explain why 
the dihedral angles x; for leucine and isoleucine display 
such a strong preference for 180° (Figure 6-17). 
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There is an obvious bias in the distribution of its con- 
stituent amino acids between the surface of a molecule 
of protein, which remains in contact with the water, and 


its interior, which is more or less withdrawn from the 
water. This bias reflects their hydropathy. 

The accessible surface area of a molecule of pro- 
tein can be estimated by asking a digital computer to per- 
form a calculation equivalent to rolling a sphere of a 
particular size, the probe, over the surface of a space-fill- 
ing crystallographic molecular model (Figure 4-17E) of 
that protein.“ The center of the spherical probe will 
trace a surface, and the area of that surface, removed 
from that of the surface of the protein by a distance equal 
to the radius of the sphere, is defined to be the surface 
area of the protein accessible to the probe. Each portion 
of the irregular surface defined by the center of the probe 
can be assigned to a particular atom in the crystallo- 
graphic molecular model by noting with which atom the 
probe was in contact when that portion was being cre- 
ated. 

The surface of a molecule of protein is not smooth, 
but highly irregular, covered with cracks, crevasses, cav- 
ities, and ridges (Figure 4-17E).'“*'*° One way to demon- 
strate this fact is to vary the radius of the probe (Figure 
6-20). When the probe is large (=1.5 nm in the example 
chosen) the crystallographic molecular model is indistin- 
guishable from a hard sphere (a sphere of radius 4.55 nm 
in the example chosen), but as the radius of the probe is 
decreased, much more surface area becomes accessible 
(the difference between the points and the curve in 
Figure 6-20) as the probe becomes small enough to enter 
the irregularities of the surface. The radius usually 
chosen for the probe,’ in an attempt to mimic a mole- 
cule of water, is 0.15 nm (the arrow in Figure 6-20). The 
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Figure 6-20: Irregularity of the surface of a protein. The radius of 
the probe used to determine accessible surface area!" was 
assigned a particular length (nanometers) and the accessible sur- 
face area (nanometers’) of the crystallographic molecular model 
(Bragg spacing > 0.18 nm) of glyceraldehyde-3-phosphate dehy- 
drogenase from Bacillus stearothermophilus*”’ was computed 
(points on graph). The smooth curve on the graph is the accessible 
surface area that a sphere of radius 4.55 nm would give as a func- 
tion of the radius of the probe rolled over its surface. The arrow is 
at a radius for the probe of 0.15 nm. The program used in the 
calculations was adapted from that of Lee and Richards! by 
Dr Ilya Shindyalov of the Protein Data Bank. 
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choice is arbitrary, especially because locations to which 
only a single molecule of water can gain access experi- 
ence diminished effects of the solvation arising from the 
bulk properties of water. 

Usually a particular amino acid in a crystallo- 
graphic molecular model is designated as buried if less 
than a certain amount of its surface area is accessible. It 
has been shown" that when the radius of the probe is 
set at 0.15 nm 


Ké %_ 


= NP =K (6-3) 
where n is the total number of amino acids in a protein, 
Ng is the number designated as buried by a particular 
rule, for example, every amino acid with an accessible 
surface area less than 0.2 nm’, and « is a constant. This 
equation states that, in a globular protein, the amino 
acids defined by a rule as buried are found within a 
roughly spherical solid of radius rg that is smaller than 
the roughly spherical solid of radius r containing all of 
the amino acids:'* 
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The spherical shell of width d between these two 
roughly spherical solids contains all of the amino acids 
(n — npg) that are accessible by the rule that has been 
chosen. This shell can be considered to be the depth of 
penetration of the probe, which represents water, into 
the interior of the protein owing to its irregular surface. If 
buried amino acids are defined as only those completely 
inaccessible to a probe of radius 0.15 nm, d is fairly large 
(1.0 nm) and the buried amino acids are deep in the inte- 
rior. This rule would require that in a small protein 
(100 aa) almost no amino acid would be completely inac- 
cessible. If the rule, however, is that any amino acids 
having accessible surface area of less than 0.2 nm? are 
defined as buried, then d is only 0.5 nm and far more of 
them are considered to be buried. 

Accessible surface areas of the amino acids in crys- 
tallographic molecular models of proteins have been cal- 
culated by use of a radius for the probe of 0.15 nm, and 
for each type of amino acid, the fraction scored as 
buried by at least three different rules has been sepa- 
rately tabulated (Table 6-2). When the most stringent 
rule is used, namely, that a buried amino acid in the 
molecular model of the protein must have no accessible 
surface area, the frequencies with which most of the 
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Table 6-2: Removal of Amino Acids from Water in Molecular Models of Proteins 


fraction buried’ 


accessible surface 


less than mean of fraction 


amino acid area” (nm?) buried 100%‘ buried 95%‘ 0.2 nm? accessible‘ surface buried? 
hydrophobic 
Ile 1.80 0.18 0.60 0.76 0.90 
Val 1.60 0.18 0.54 0.74 0.88 
Phe 2.20 0.14 0.50 0.69 0.89 
Leu 1.80 0.16 0.45 0.71 0.87 
Met 2.05 0.11 0.40 0.66 0.83 
Ala 1.15 0.20 0.38 0.63 0.78 
Trp 2.60 0.04 0.27 0.62 0.88 
apathetic 
Ser 1.20 0.08 0.22 0.46 0.62 
Thr 1.45 0.08 0.23 0.41 0.60 
His 1.95 0.02 0.17 0.44 0.78 
Tyr 2.30 0.03 0.15 0.34 0.74 
hydrophilic 
Glu 1.85 0.03 0.18 0.24 0.74 
Asp 1.50 0.04 0.15 0.27 0.67 
Asn 1.60 0.03 0.12 0.30 0.61 
Gln 1.90 0.01 0.07 0.23 0.61 
Lys 2.10 0 0.03 0.05 0.60 
Arg 2.40 0 0.01 0.10 0.51 


“For entire amino acid, both its side chain and its contribution to the backbone, in the tripeptide Gly-X-Gly.!“*""” "Fraction of the total number of that amino acid in a series 
of crystallographic molecular models that are buried by the noted criterion. ‘Reference 148. “Reference 145. 


amino acids are buried is very low and the statistics 
become unreliable. When the rule is relaxed, more amino 
acids are scored as buried, and discriminations become 
more dependable. 

Half of the amino acids, when they are in an 
unfolded polypeptide and freely accessible to a probe of 
0.15 nm, have total accessible surface areas between 1.5 
and 2.0 nm and therefore are of similar size (Figure 
4-14). Small amino acids such as alanine are probably 
buried more often simply because they are easier to sur- 
round, and large amino acids such as tryptophan are 
harder to surround completely and bury, especially in 
the smaller proteins. These stereochemical problems 
must contribute to the observed distributions. 

Nevertheless, it has already been noted that the fre- 
quencies with which the various amino acids are buried 
are correlated’? with the free energies of transfer for 
their model compounds from water to the gas phase 
(Table 5-9) and also with many of the other scales of 
hydropathy (Figure 5-24). This correlation is actually 
established by the three main groups of side chains 
(Table 6-2): those that are hydrophobic, those that are 
apathetic, and those that are hydrophilic (note the three 
clusters in Figure 5-24). Within each of these groups, 
however, there is no significant correlation between 
extent of burial and any scale of hydropathy derived 
from free energies of transfer. Presumably, the reason for 
the lack of correlation within the main groups is that 


stereochemically and energetically protein folding is not 
a transfer between solvents. With this in mind, it still can 
be stated that if an amino acid is hydrophobic, it is more 
likely to be buried, and it is hydrophilic, it is more likely 
to remain in contact with the water in the folded 
polypeptide. 

The results of the hydrophobic effect are most read- 
ily appreciated by examining the internal core of a crys- 
tallographic molecular model (Figure 6-21). This 
region is enriched in definitively hydrophobic amino 
acids such as leucine, isoleucine, valine, and phenylala- 
nine. An even more dramatic example of a hydrophobic 
core is the center of a p helix, which is completely walled 
off from the water by the backbone of the helix and which 
is composed exclusively of aliphatic side chains.'*! 

Each of the hydrogen-carbon bonds on the side 
chains that are removed from water provided favorable 
hydrophobic free energy to drive the folding of the 
polypeptide. This fact can be verified by performing site- 
directed mutation. So that no adverse steric effects are 
encountered, either an isoleucine found in the core of a 
crystallographic molecular model of the protein is short- 
ened by converting it to a valine or an alanine, a leucine 
or a valine in the core is shortened by converting it to an 
alanine,'”*°° or a position in the core next to a cavity is 
chosen and the mutants designed so that they expand 
into the cavity.!”” The change in the standard free energy 
of folding produced by the various mutations is then 


measured. In such studies, it has been found that 
whether hydrogen-carbon bonds are added to or 
removed from the core relative to the number present in 
the core of the native protein, each one contributes 
between -1 and -5 kJ (mol of hydrogen-carbon bond)" 
to the standard free energy of folding, with most of the 
values clustered' 9161 between -2.5 and -3.5 kJ (mol 
of hydrogen-carbon bond) 1. These values encompass* 
and are indistinguishable from the value of -2.8 kJ (mol 
of hydrogen-carbon bond)” for the transfer of hydrocar- 
bon from water to liquid hydrocarbon (Figure 5-21).’* 
The interior of a protein is enriched in hydrophobic 
amino acids because this is the only way to obtain the 
free energy necessary to drive its folding. 

When a hydrophilic amino acid such as lysine or 
glutamate is introduced into the interior of a molecule of 
protein by site-directed mutation, the protein becomes 
significantly less stable. For example, when Methionine 
102 and Leucine 133 in lysozyme from bacteriophage T4, 
which are both buried in its interior, were replaced in 
turn with a lysine and an aspartate, respectively, the pro- 
tein became less stable by 29 and 24 kJ mol. The region 
surrounding the new lysine at position 102 became much 
more mobile to permit limited access of the lysine to the 
solvent, and its pK, was found to be 6.5, a shift indicating 
that the neutral conjugate base had become more stable 
than it would have been if it were fully exposed to the 
water JI When Valine 16 of ribonucleaseT, from 
Aspergillus oryzae is replaced with its isostere threonine, 
the stability of the protein decreases by 15 kJ mol’, and 
the side chain of the threonine has rotated 120° relative 
to that of the valine around the Co-Cß bond to direct the 
hydroxyl group toward the solvent rather than toward 
the interior as the CH. group of the valine was 
directed.’ Often, however, when valine is replaced by 
threonine or alanine is replaced by serine in the interior 
of a protein, the hydroxyl group finds a nearby acceptor 
for its hydrogen bond, and the change in stability is small 
(<4 kJ mol').!® 

When a hydrophobic amino acid ends up fully 
exposed on the surface, this exposure is neither energet- 
ically unfavorable nor favorable because it was fully 
exposed before the polypeptide folded. When a pheny- 
lalanine was placed in turn by site-directed mutation at 
a number of fully accessible sites on the surface of micro- 
coccal nuclease, the stability of the protein decreased on 


* The broad range of these values!” may be due to the fact that 
each of these hydrophobic chains that has been mutated in the 
respective experiments in located in its own particular surround- 
ings within the native structure, and each of those surroundings 
produces its own particular steric effects and standard free energy 
of solvation.’ It has been shown that the magnitude of the change 
in standard free energy of folding produced by removing hydro- 
gen-carbon bonds by mutation is directly proportional to the 
number of other hydrogen-carbon bonds surrounding the point of 
the mutation in the crystallographic molecular model of the 
protein. 
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Figure 6-21: Hydrophobic core in 
model (Bragg spacing = 0.17 nm) of a 
typeI cohesin domain from the 
cellusomal scaffolding protein A of 
amino acids that were missing from 
the map of electron density, is pre- 
sented. Only the side chains (thicker 
lines) of the amino acids in the core 
of the molecule are drawn. Note the 
different rotamers of isoleucine and 
the fact that dihedral angles 7, for the 
phenylalanines and tyrosines are 
almost all near 90°. The numbers are 
those for the amino acid sequence of 
the intact cellusomal scaffolding pro- 
tein. This drawing was produced 
with MolScript.? 
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average for each of these single substitutions by only 
2 kJ mol, and at least half of that decrease in stability 
resulted from steric effects.!° No consistent changes in 
stability were observed when several different hydropho- 
bic amino acids were substituted by site-directed muta- 
tion for amino acids on the surface of lysozyme from 
bacteriophage T4.'” 

Clusters of aromatic amino acids, such as the one 
in the typeI cohesin domain (Figure 6-21), are often 
found in the hydrophobic core of a crystallographic 
molecular model. The tendency for aromatic amino 
acids, cystines, and arginines to cluster has been 
explained in terms of favorable overlaps of their x mole- 
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cular orbitals,'® but just as often aromatic and aliphatic 
amino acids intermingle. One interesting aspect of the 
cluster of aromatic amino acids in type I cohesin domain 
(Figure 6-21) is that it illustrates the tendency of two 
phenyl rings, in isolation from water, to form a complex 
in which the planes of the two rings are at around 90° to 
each other.’ This orientation is commonly observed in 
the crystallographic molecular models of proteins, and 
theoretical calculations for benzene dimers in the gas 
phase suggest that it is the energetically favored arrange- 
ment. H 

One of the most notable features of the accessibili- 
ties of the amino acids in the native structure of a protein 
is that, in the folded polypeptide, the accessible surface 
areas of all types are less than they were in the unfolded 
polypeptide (Table 6-2). The accessible surface areas 
tabulated are for the entire amino acid in a polypeptide, 
both side-chain and backbone segments. Usually, the 
backbone is buried before the side chain, so a significant 
portion of the mean fraction of surface buried for each 
type of amino acid is accounted for by this fragment 
common to all of the amino acids. This backbone por- 
tion, however, cannot account for more than 
0.6-0.7 nm? of buried surface area because the accessi- 
ble surface area of glycine in a polypeptide’ is only 
0.75nm?. Therefore even the most accessible side 
chains, arginine, lysine, and glutamine (with a mean 
buried surface of 1.2 nm’), have more than 0.5, 0.55, and 
0.45 nm? of the surface area of their side chains buried, 
respectively. The regions of each of these hydrophilic 
side chains that are buried are usually their 
hydrogen-carbon bonds. For example, in the crystallo- 
graphic molecular model of the complex between the 
Ha-ras oncogene product p21 and its substrate, the 
amino group of Lysine 117 is engaged in several hydro- 
gen bonds, but its butyl group is fully buried just as 
would be the side chain of a leucine.” 

There should be a normal hydrophobic effect asso- 
ciated with the removal of the butyl group of lysine, the 
propyl group of arginine, and the ethyl groups of gluta- 
mine and glutamate from exposure to water even though 
these side chains in their entirety are among the most 
hydrophilic on all of the scales of hydropathy. To assess 
the hydrophobic effect that was contributed by burying 
these portions of each of these amino acids, as well as all 
of the others, the contribution of each atom in an amino 
acid to its overall hydropathy should be extracted. From 
these atomic parameters, the free energy of transfer for 
only those portions of each amino acid that are actually 
buried could be calculated. It was noted’” that free ener- 
gies of transfer for individual solutes between water and 
the gas could be dissected into the individual contribu- 
tions of each covalent bond that they contained. A simi- 
lar dissection has since been performed upon the set of 
free energies of transfer for the N-acetyl-a-amides of the 
amino acids between water and 1-octanol.'” In this 
latter dissection, a series of parameters based on individ- 


ual atoms, rather than covalent bonds, has been pre- 
sented that can reproduce the original scale of hydropa- 
thy with acceptable precision. Presumably every scale of 
hydropathy presently in use can be so dissected. 

Free energies of transfer for model compounds of 
tryptophan or tyrosine between water and a solvent 
such as 1-octanol (Figure 5-24)" or ethanol,!” as 
opposed to free energies of transfer between water and 
the gas phase (Table 5-9), have always suggested that 
tryptophan and tyrosine should be more hydrophobic 
than they seem to be when they are found in a protein 
(Table 6-2). The explanation for this is probably that sol- 
vents such as ethanol and 1-octanol are able to form 
hydrogen bonds with the one donor on the indole and 
the donor and acceptor on the phenol, making them 
more soluble in these solvents than they would be in a 
hydrocarbon and making them seem more hydrophobic 
than they are. This would be consistent with the obser- 
vation that it is the hydroxyl on tyrosine that usually 
remains in contact with the water in crystallographic 
molecular models.” Because of the requirement that the 
nitrogen—hydrogen bond in the indole of tryptophan also 
be hydrogen-bonded, the portion of the side chain con- 
taining this bond is also usually in contact with the water. 
But indole is large and the rest of it is usually buried. As a 
result, it is only in the last column of Table 6-2 that the 
hydrophobicity of tryptophan is manifest. 

The neutral, but hydrophilic, amino acids gluta- 
mine and asparagine are straightforward examples of 
the effect of the hydrogen bonds formed with water in 
the unfolded polypeptide on the location of that amino 
acid in the folded polypeptide. Complete withdrawal of 
the two hydrogen-bond donors on glutamine or 
asparagine from water during folding would result in a 
net disappearance of two hydrogen bonds from the solu- 
tion. The difficulty of simultaneously regaining both of 
these lost hydrogen bonds in the interior of the protein 
seems to be great enough that the primary amides in the 
side chains of most of the glutamines and asparagines in 
a protein end up in the folded polypeptide fully exposed 
to the aqueous phase.'”° 

It might be supposed that buried hydrogen bonds 
between side chains on different segments of secondary 
structure would be important factors because these 
would be capable of organizing significant regions of the 
protein.’ Of the rarely buried hydrogen bonds between 
side chains, however, only about 20% are the type that 
connect different segments of secondary structure; the 
other 80% connect donors and acceptors within the 
same «helix, Bsheet, or $ turn. The steric require- 
ments of packing the secondary structures efficiently and 
avoiding empty space are far more important than 
hydrogen bonds in positioning the segments of second- 
ary structure and organizing the overall structure of the 
protein, and the few buried hydrogen bonds that do 
occur between segments of secondary structure are 
probably adventitious. It is the interdigitation of the side 


chains protruding from f sheets and æ helices that ori- 
ents these secondary structures. 
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Packing of the Side Chains 


Although there often are flexible loops, sometimes quite 
a long one,'”° bulging out from a globular protein and 
occasionally there is a tunnel passing through its interior 
that is required for its function,!””'” most of its mass is 
formed by compactly layering secondary structures one 
against the other. In the crystallographic molecular 
model of a protein, the space between these layered 
æ helices, 8 sheets, and random meander is filled com- 
pletely by the side chains of the amino acids that pro- 
trude from the polypeptide at each acarbon. The 
arrangement of these side chains is organized in such a 
way that the amino acids in the interior of the protein are 
packed with admirable efficiency. This tight packing is 
illustrated by the fact that space-filling crystallographic 
molecular models of proteins rarely contain empty 
spaces large enough to accommodate even a molecule of 
water. 

A space-filling crystallographic molecular model 
is constructed by placing spheres of the appropriate van 
der Waals radii (Table 6-3) at the positions of the centers 
of atoms in the model. The van der Waals radius of an 
atom is the radius that produces a sphere the surface of 
which is coincident with the boundary of that atom to 
penetration by the boundary to penetration of another 
atom.’ These van der Waals radii are significantly 
greater than the covalent radii of the various atoms. 

It is difficult to obtain reliable values for van der 
Waals radii, so those tabulated are averages of values 
estimated in several different ways. Because the van der 
Waals radii define the boundaries of impenetrability, 
when matter is packed together in a condensed phase, 
the centers of no two atoms can approach closer than the 


Table 6-3: van der Waals Radii 


atom van der Waals‘ radius (nm) 
Hatiphatic 0.115 
Haromatic 0.100 
C 0.175 
N 0.160 
O 0.150 
S 0.180 
P 0.180 


179-183 


“Values are averaged from various tabulations and expressed to nearest 


5 pm, which may overstate their accuracy.'” 
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sum of their van der Waals radii, and examining the 
details of the packing of side chains in the interior of a 
crystallographic molecular model of a protein is one way 
to estimate values for van der Waals radii.'® The spheres 
defined by the van der Waals radii define the van der 
Waals surface of a molecule:* 
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which is distinct from either the molecular surface or the 
accessible surface. 

When atoms in the interiors of a set of crystallo- 
graphic molecular models, from data sets each of which 
is to Bragg spacings of 0.17 nm or less, are assigned their 
respective van der Waals radii and hydrogens added at 
their van der Waals radii, one discovers these atoms 
almost always approach each other as closely as possible 
and do so in such a way that there is almost no empty 
space among them." The packing is often so tight that 
methyl groups are confined in equilaterally triangular 
spaces clamping their hydrogens. Consequently, it is not 
surprising that the alkyl groups in the center of a mole- 
cule of protein are confined to specific conformations 
rather than being free to rotate as they would be in a 
liquid. One result of this tight packing is that at some 
locations there is not enough space for a side chain. Such 
locations are occupied by glycines. Even in these 
instances, the fit is often so tight that it is steric interac- 
tions with the hydrogens of the glycine that position the 
a carbon and determine the values for dihedral angles y 


*Reprinted with permission from ref 184. Copyright 1996 
Academic Press. 
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and ¢.'*! Even though it might seem to be the case from 
an examination of Figure 6-4B, not all of the glycines ina 
protein, however, are handling unavoidable steric prob- 
lems because many can be replaced with larger amino 
acids by site-directed mutation without affecting the 
function of the protein.'” 

Another consequence of the economy in arranging 
the side chains of the amino acids is that the volume of a 
molecule of protein is quite small relative to its molecu- 
lar mass. The volume occupied by the atoms in a mole- 
cule of protein can be calculated by summing all of its 
individual atomic volumes defined by the van der Waals 
surface, and the actual volume of the molecule can be 
calculated from its partial specific volume and its molec- 
ular weight. From this calculation it is learned that 75% 
of the volume of a molecule of protein is occupied by 
atoms.'*”!84/86 By comparison, in most organic liquids 
only about 45% of the volume is occupied by atoms and 
in water only 36%, but in a solid of hexagonally packed 
spheres 75% of the volume would be filled by atoms.'®° In 
anhydrous crystals of small organic molecules, 70-80% 
of the volume is filled by atoms.'® 

The high density of the packing in the interior of a 
molecule of protein is also reflected in its compressibil- 
ity. The compressibility of the interior of a molecule of 
protein has been estimated from two different interpre- 
tations of the available experimental measurements *"'*” 
to be about 20 Gbar"". This value is intermediate between 
those for liquids (CCl, 100 Gbar"; C,H», 105 Char 
H,O 46 Gbar") and solids (ice Ih, 12 Gbar™; quartz, 2.7 
Gbar™; NaCl, 4 Gbar). 

At first glance, all of these results seem incompati- 
ble with the observation that the partial specific volume 
of a protein (usually 0.72-0.75 mL g”) can be calculated* 
quite accurately from the sum of the molar volumes of its 
constituents'®*'#° because each protein has a unique 
structure. The accuracy of this calculation suggests that 
each structure, although it is unique, incorporates the 
requirement that its volume be as small as possible. The 
minimization of molecular volume is an important 
noncovalent force in the folding of a molecule of protein, 
and it dictates many of the features of the structure. This 


* This calculation does not treat the constituents as independent 
solutes in free solution. In fact, if each side chain were an inde- 
pendent solute, each of their partial molar volumes would include 
a covolume,'” which is a volume that arises simply because a par- 
ticular constellation of atoms is an independent molecule dis- 
solved in a given solvent. These covolumes are substantial. For 
water! the covolume of a solute is 14 cm? mol, and for organic 
solvents! it is 25 cm? mol. Therefore, the sum of the partial 
molar volumes of the components of a protein, were they each sep- 
arate molecules in solution, would be significantly greater than its 
actual partial molar volume. To the extent that its covolume arises 
from the fact that a solute is in free solution in a given solvent, the 
fact that the partial molar volume of a protein is the sum of the 
atomic volumes of its substituents with no added covolume states 
that those substituents are not in free solution. This of course is 
true; they are economically packed into a solid. 


noncovalent force minimizing the empty space within a 
molecule of protein can be considered to be a conse- 
quence of the hydrophobic effect, if the hydrophobic 
effect is defined as the tendency of water to minimize the 
volume of the cavity occupied by any solute. 

Although there are a few proteins in which the 
folded polypeptide forms a knot,'”"'” the packing of 
every other protein appears to result from the consecu- 
tive layering of one element of secondary structure upon 
another, much as one would fold a cloth or a hinged rod. 
a Helices pack upon chelices, Bsheets pack upon 
p sheets, and « helices pack upon £ sheets. In all of these 
situations the secondary structures take up orientations 
with respect to each other that permit the side chains 
that protrude from each of them to interdigitate (Figure 
6-22). This interdigitation is the reason that there is 
very little vacant space in the interior of a molecule of a 
protein. If it can be assumed that the configuration of 
minimum volume is the preferred configuration in the 
condensed phase, then these interdigitations promote 
the achievement of such a minimum volume. In order to 
form as many interdigitations as possible, the individual 
segments of secondary structure are required to assume 
preferred orientations with respect to each other. Viewed 
in this perspective, packing is a structural force just as the 
formation of hydrogen bonds between buried donors 
and acceptors on the side chains of the amino acids 
would be a structural force, but packing is more impor- 
tant. 

The orientation between two «helices, two sheets 
of B structure, or an o helix and a sheet of D structure can 
be assigned an angle Q." The angle Q between two 
ahelices is the angle between their two axes (Figure 
6-23).'”” The sign on Q is given by the right-hand rule. 
Consequently, the angle Q in Figure 6-23 has a negative 
sign. Because the pattern in which the amino acids pro- 
trude from an o helix has a 2-fold rotational axis of pseu- 
dosymmetry at each position in the œ helix (focus on 
position i at the right of Figure 6-23), the axis of the 
æ helix has no direction associated with it and a value of 
Q = -50° is equivalent to a value of Q = +130°. The 
angle Q between two sheets of # structure is the angle 
between the direction of the parallel or antiparallel 
strands in one sheet and the direction of the strands in 
the other sheet (Figure 6-24).'” The right-hand rule 
determines the sign of angle Q, and the angle Q in Figure 
6-24B is, therefore, negative. No distinction is made 
between parallel and antiparallel relationships of the 
strands or the amino- and carboxy-terminal ends of a 
given strand of D structure because all combinations of 
these distinct stereochemistries produce almost the 
same pattern in which the side chains are distributed 
across the face of the sheet (Figure 4-16B,C). The angle Q 
between an o helix and a sheet of £ structure is the angle 
between the axis of the œ helix and the direction of the 
p strands (Figure 6-25D).!% 

The most frequently observed angle Q between two 


ahelices'” is around -50°, the angle used to construct 
Figure 6-23. One-third of all adjacent a helices in molec- 
ular models of globular proteins are inclined with respect 
to each other by an angleQ between -60° and -40°. 
There is a steric explanation for this preference. When an 
a helix is observed from one side (Figure 6-26),'” it can 
be seen that the side chains are arranged in sets of paral- 
lel ridges and grooves. For example, side chains 8 and 4; 
19, 15, and 11; and 22 and 18 in Figure 6-26 form a set of 
ridges and grooves, but so do side chains 4 and 1, 11 and 
8, 18 and 15, and 22 and 19. If one œ helix is opposed to 
another, the first set of these ridges will fit into the 
second set of these grooves, and conversely when the 
angle between the two helices is -50° (Figure 6-23). An 
example of the interdigitation that occurs in such situa- 
tions is found between two adjacent œ helices in the 
molecular model of bovine carboxypeptidase A (Figure 
6-27). 

There are a number of other values for the angle Q 
between two o helices that promote less favorable inter- 
digitations of the side chains, and examples of all of them 
have been observed IT" Because several possibilities exist 
and because «helices can tighten or loosen to accom- 
modate different angles close to the ideal values, the dis- 
tribution of angle Q between -90° and +90° is fairly 
uniform!” with the exception of the striking and sharp 
peak at -50°. For example, in the crystallographic molec- 
ular model of cytidine deaminase, two o helices in the 
interior cross at an angle Q of 90°, but the side chains pro- 
truding from them at the interface do not pack together 
well.'” In the distribution of angles Q, however, there is 
another preferred angle represented by a broad maxi- 
mum in the distribution at +20°.'” This angle Q defines 
the orientation of two «helices in a coiled coil. 

Suppose that the amino acids emerged from a 
right-handed o helix at successive angles of precisely 
102%° instead of about 99°. Every seven amino acids (7 
x 102% = 720) in such an o helix, the angular dispositions 
of the side chains would repeat precisely. Two such tight- 
ened o helices could be placed next to each other, with 
their axes parallel, in such a way that their side chains 
would interdigitate regularly along the interface (Figure 
6-28).°°°° Every seventh side chain in one helix would sit 
to one side of every seventh side chain of the other, and 
every side chain four amino acids to the carboxy-termi- 
nal side of every seventh side chain in one polypeptide 
would sit to the other side of every side chain four posi- 
tions to the carboxy-terminal side of every seventh side 
chain in the other. As a result of these interdigitations, 
the two œ helices could comfortably fit together side by 
side for an indefinite length because the topography of 
the interface would repeat precisely every seven amino 
acids. 

aHelices, however, do not have angles between 
successive side chains of 102%° but less than that. 
Crick*” pointed out that such an interface, permitting 
the advantageous interdigitation and repeat every seven 
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side chains, nevertheless could be retained if the two 
a helices, instead of being parallel to each other, twisted 
around each other in a left-handed coiled coil such that 
the twist of the coiled coil exactly compensated for the 
difference between the actual angle between successive 
amino acids in the o helix and 1025°. If the actual angle 
between successive amino acids in a right-handed 
ahelix is 99°,®”* the two a helices would have to twist 
around each other in a left-handed sense at -3%° for 
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Figure 6-23: Use of superimposed helical nets ~” to describe the contacts at an interface between two «.helices.“ (Top) The angle between 
two adjacent o helices, i and j, is defined as the angle Q between the two axes; its sign is determined by the right-hand rule. (Right) o Helix i 
in a vertical orientation is numbered out from its center. The central amino acid is given the designation i; those below are designated by 
negative integers, and those above, by positive integers. The relative orientations in which amino acids i- 7, i- 4, i- 3, i,1+3,i+4,andi+7 
are distributed can be projected onto a plane tangential to the position of amino acid i. These seven projected points define a unique lattice, 
or helical net. For o helix i the lattice is face-up, in the orientation of the original œ helix. (Left) o Helix j also defines the same lattice as that 
defined by whelixi, but, because a helix j is to be opposed to œ helix i, face to face, the helical net for œ helix j is flipped over, face-down. 
(Bottom) The two helical nets, the one for o helix i face-up and the one for o helix j face-down, are then opposed and rotated with respect to 
each other until maximum interdigitation of the lattice points is achieved. The angles at which maximum interdigitation occurs in the heli- 
cal nets will be the angles at which maximum interdigitation occurs between the amino acids of the two whelices. Adapted with permission 
from ref 197. Copyright 1981 Academic Press. 


every position in one of the sequences. Although the sequence,” three parallel «helices of identical 
actual twist of a coiled coil should reflect a tradeoff sequence,” three parallel «helices of nonidentical 
between energy required to tighten the «ahelix and sequence,” three antiparallel œ helices of identical 
energy required to bend the o helix into the supercoil, in sequence," three antiparallel o helices of nonidentical 
the coiled coil of tropomyosin the twist is -3.4° to -3.9° sequence,” four parallel «helices of identical 
for each position in the sequence,” and in the one sequence,» >21 four antiparallel o helices of nonidenti- 
from general control protein GCN4 (Figure 6-29)7"*°" it cal sequence,” ° five parallel helices of identical 
is -3.6° to -3.9°, values that seem almost too close to the sequence, five antiparallel «helices of nonidentical 
expected one. sequence,” and 12 antiparallel o helices of nonidentical 
The original coiled coil of whelices predicted from sequence producing a cylinder with a hollow center.””” 
these geometric arguments contained two parallel There is even an example of a coiled coil of four antiparal- 
a helices. The coiled coils formed by tropomyosin and gen- lel o helices that coils around another copy of itself to form 
eral control protein GCN4 are coiled coils of two parallel a coiled coil of coiled coils (Figure 6-30).”“° 
a helices of identical sequence. There are, however, exam- That both parallel and antiparallel arrangements 
ples of coiled coils of two parallel o helices of nonidentical are observed must follow from the two facts that an 


sequence,” two antiparallel whelices of nonidentical ahelix has a pseudo-2-fold axis of symmetry with 
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Figure 6-26: Orientation of the side chains in an o helix.” (A) The 
amino acids in an ahelix are numbered consecutively from top to 
bottom with the front face accentuated. (B) A topographic map of 
the front face of an o helix of polyalanine. The contours are at inter- 
vals of 0.05 nm and the topmost contour, that surrounding the 
numbers 4, 15, and 22, is at 0.475 nm from the axis of the æ helix. 
The numbers are on the methyl carbons of the appropriate alanine. 
Reprinted with permission from ref 197. Copyright 1981 Academic 
Press. 


respect to the emergence of its side chains and that the 
packing of these side chains governs the existence of a 
coiled coil. The twist in a coiled coil of three o helices is 
-3.0° to -4.0° for each position in the sequence of one if 
its a helices; that in one of four œ helices, -1.9° to -3.0°; 
and that in one of five o helices, -2.6° 7910213218217 but 
the situation seems to be much less constrained for those 
with four and five o helices, for which there are examples 
of coiled coils with right-handed supercoiling.””' 

Crick calculated the diffraction pattern of X-radia- 
tion expected from a macroscopic fiber constructed of 
aligned coiled coils of a helices and was able to explain 
why the meridional reflection in the pattern that would 
normally arise from the pitch of 0.54 nm for an untwisted 
æ helix should shorten to a pitch of 0.51 nm when the 
a helix becomes twisted into a coiled coil. A prominent 


Figure 6-27: Packing of two opposed chelices in the crystallo- 
graphic molecular model of bovine carboxypeptidase A.” An 
æ helix in the molecular model comprising amino acids 72-90 is 
tightly opposed to an o helix comprising amino acids 285-307. The 
molecular model was converted into a space-filling representation 
and only the amino acid side chains along these two o helices were 
displayed. A set of three parallel planes was cut through the inter- 
face between the two helices and superposed to create a topo- 
graphic map through the interface. The planes were approximately 
parallel to the two helical axes and at intervals of 0.1 nm. The con- 
tours of the side chains in the lower a helix are solid; those in the 
upper are broken. Each amino acid in the map is designated by its 
number in the amino acid sequence, and lines designating the axes 
of the two whelices are included. The angle Q between these two 
helices!” is -56°. Reprinted with permission from ref 197. 
Copyright 1981 Academic Press. 


meridional reflection, representing a repeat of 0.51 nm, 
had been observed previously” in the diffraction pat- 
terns of fibers of keratin, myosin, and fibrinogen, and it 
is now known that such a reflection is indicative of the 
coiled coils of a helices in these proteins. The infrared 
spectra of coiled coils of a helices are also characteris- 
tic. 

The sequence of the polypeptides in any coiled coil 
of æ helices can be divided into successive units, or hep- 
tads, each seven amino acids in length. The first and 
fourth amino acid in each heptad (positions a and d in 
Figure 6-28) are the most deeply buried amino acids in 
the interface between the two or more œ helices in the 
coiled coil (Figure 6-29). These most deeply buried loca- 
tions are isolated from the water surrounding the coiled 
coil, and the side chains sequestered there are usually 


Figure 6-28: Interaction between chelices in a coiled coil.°°* 
(Top) Alignment of two tightened o helices with 3.5 amino acids for 
each turn rather than 3.6. Amino acids from amino-terminal to car- 
boxy-terminal are designated as a, b, c, d, e, f, and g; the view is end- 
on looking from amino-terminal to carboxy-terminal amino acid. 
Every seven amino acids the orientations would repeat, and this 
would place the amino acid after amino acid g precisely below 
amino acid a and so forth. (Middle) The two whelices in the top 
panel are cut along two respective lines normal to the plane of the 
page and passing through amino acids fand f’ and then flattened, 
one against the other. The two resulting planes are then turned 
together -90° about a vertical axis so that the gray positions end up 
above the white. This view illustrates the interdigitations of amino 
acids a and d (Bottom) Three tightened a helices running parallel 
to each other. In this arrangement also, amino acids a and d can 
interdigitate. 
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g spacing > 0.18nm) of a 
% A peptide 31 amino acids 


long with the amino acid sequence from Arginine 
249 to Glycine 279 of general control protein GCN4 
from S. cerevisiae was synthesized. In solution two 


Figure 6-29: Drawing of the crystallographic 


molecular model (Bra 


coiled coil of a helices. 
coil of œ helices, which is the native structure of this 


portion of the full-length protein. The coiled coil 
was crystallized, and a crystallographic molecular 
model was built from a map of electron density cal- 
culated from a data set gathered from these crystals. 
The numbering in the figure is that of the amino 
acid sequence of the full-length transcription factor, 
only a segment of which is represented by the syn- 


thetic peptides. Only the side chains of the amino 
acids in the a, d, e, g, a’, de, and g’ positions of the 


coiled coil (Figure 6-28) are included in the figure to 
emphasize the interface between the two o helices. 


copies of this peptide spontaneously form a coiled 
This drawing was produced with MolScript.°” 


hydrophobic amino acids such as leucine, valine, 
isoleucine, alanine, phenylalanine, tyrosine, and methio- 
nine.” The hydrophobic amino acid can also be a cystine 
as in the antiparallel coiled coil of «helices in 
carboxypeptidase C from Saccharomyces cerevisiae 
(Figure 6-19).! An ahelix that has a heptad repeat of 
hydrophobic amino acids is an amphipathic œ helix 
(Figure 6-8). There are a few interesting exceptions to the 
rule that coiled coils are formed from amphipathic o he- 
lices, such as the chloride ion chelated by five symmetri- 
cally arrayed glutamines in the center of the coiled coil of 
five parallel ahelices in extracellular matrix protein 
COMP.*"’ 
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Each of the amino acids in the core of the coiled 
coil (amino acids a and d in Figure 6-28) on one g helix 
is sandwiched between its partner on the opposite coiled 
coil (a’ and d’) and the amino acid to the amino-terminal 
(g’) or carboxy-terminal (e’) side, respectively, of its part- 
ner (Figure 6-28, middle panel). For example, Leucine 
253 in one of the whelices in the coiled coil of general 
control protein GCN4 (Figure 6-29)” is sandwiched 
between Leucine 253 and Glutamate 254 on the other; 
Valine 257, between Lysine 256 and Valine 257 on the 
other; Leucine 260, between Leucine 260 and Leucine 
261 on the other; Leucine 267, between Leucine 267 and 
Glutamate 268 on the other; and so forth. The side chains 
of the amino acids on the flanking positions (g and e, 
respectively) are often ones that are hydrophobic distally 


and hydrophilic peripherally, such as glutamate, lysine, 
arginine, and glutamine,” that can provide 
hydrogen-carbon bonds to cover the hydrophobic side 
chains in the core before they enter the water fully. 

In the crystallographic molecular model of the 
coiled coil from transcription factor GNC4, the 
hydrophobic amino acids of the heptad repeats interdig- 
itate along the interface between the two ahelices to 
form the hydrophobic core of the structure and to pro- 
duce the supercoil. They and those that flank them pack 
so closely together and so efficiently that there is almost 
no vacant space in the core of the structure. The two 
identical o helices are parallel to each other and packed 
in precise register so that each central hydrophobic side 
chain packs against its twin from the other œ helix. The 
asparagines at position 16 pack side by side in the core of 
the coiled coil, and because the hydrogen bond between 
them lies upon the axis of symmetry of the coiled coil, the 
two possible hydrogen bonds between the respective 
amido protons and acyl oxygens are present in the map 
of electron density as alternative conformations. 

Because a coiled coil has such a regular structure, it 
is possible to design synthetic peptides that assume a 
coiled coil of ahelices spontaneously!” by incorpo- 
rating heptad repeats into their sequences. Almost any 
amphipathic œ helix has the potential to form a coiled 
coil. It is not possible, however, to predict what type of 
coiled coil they will form?" because this decision seems 
to be made on the basis of the details of the packing in 
the hydrophobic core between the two chelices. For 
example, the packing in the interior of a coiled coil of 
three «.helices is so tight that the inability of the three 
equivalent leucines to adopt the proper angles yı in a 
parallel arrangement causes the three copies of one of 
these synthetic peptides to form a coiled coil of antipar- 
allel æ helices instead.”"' Likewise, when mutations were 
made to hydrophobic amino acids in the heptad repeat 
of the portion of general control protein GCN4 that nor- 
mally forms a symmetric coiled coil of two parallel œ he- 
lices (Figure 6-29), the mutant peptides unexpectedly 
formed coiled coils of three and four parallel o helices.” 
Interactions among the amino acids facing the water also 
influence the type of coiled coil formed.”” All of these 
results suggest that the different types of coiled coils have 
similar structural requirements. 

Although there are exceptions in which a coiled coil 
has a right-handed twist,”'*° the left-handed twist of 
most coiled coils causes the angle Q between any two of 
the «helices to be +18° to +24° (Figures 6-29 and 
6-30).°%1917 Such structures contribute to the broad 
maximum at 20° in the distribution of angle Q between 
two ahelices. As the number of «helices bundled 
together becomes larger and as the constraints of the 
overall tertiary structure of the protein are exerted, the 
regular packing of the coiled coil breaks down. 
Nevertheless, in a bundle of æ helices stacked next to 
each other at angles Q around +20°, such as the one in 


protein R2 from ribonucleoside-diphosphate reductase 
of E coli” and the one in ö-endotoxin CrylIIIA(a) from 
Bacillus thuringiensis subspecies tenebrionis,” there are 
hints of coiled coils. 

Although there are examples of ßsheets fully 
exposed on both faces to solvent,” *” almost all sheets 
are found packed against other p sheets or sandwiched 
between layers of o helices, usually in the most buried 
regions of a molecule of protein. When f sheets pack 
against each other, there are only two orientations 
observed with significant frequency. In a sheet of ß struc- 
ture, the side chains of the amino acids along a given 
strand alternate in protruding above the sheet and below 
the sheet (Figure 4-16B,C). On a given side of the sheet, 
the protrusions form an approximately square array 
aligned with the £ strands (Figure 6-31).”” There are two 
ways for two square arrays to interdigitate if they are flat, 
one with angle Q = 0° and the other with angle Q = 90°. 

There are examples of two ß sheets packing against 
each other with an angle Q of 0°, each strand of the one 
sheet running almost exactly between two of the strands 
of the other." as the closed fingers of one hand fit into 
the grooves of the closed fingers of another, but because 
p sheets twist (Figure 6-9) and because the array is not 
exactly square (Figure 6-31), sheets of D structure usually 
pack at angles Q of -30° + 15° (Figure 6-24).'%8% 
Were two twisted ßsheets to be packed in parallel to 
each other with angle Q equal to 0°, their two (indicated 
by U and D in Figure 6-24A,B) twists would match. The 
twist, however, causes the side chains protruding from 
the bottom of the top sheet to lean one way and those 
protruding from the top of the bottom sheet to lean the 
other,” and when the side chains interdigitate the 
sheets cannot be parallel to each other but are forced to 
assume an angle Q around -30°. Even at an angle Q of 
-30°, however, the twists of the two ß sheets fit together 
effectively.*' The associations between the two parallel 
Bsheets in prealbumin illustrate the stereochemistry at 
such an interface between two twisted sheets (Figures 
6-24E and 6-32). The preference for an angle Q of 
-30°, however, is not a strong one because local disrup- 
tions in the ß strands can cause the angle to shift to pos- 
itive values.” Even ß sheets as narrow as two £ strands in 
an antiparallel B hairpin will stack against each other 
with the strands parallel "7" 

An angle Q equal to 90°, the other value predicted 
for the stacking of two square arrays, is also a preferred 
orientation observed for two stacked £ sheets.” When 
two sheets of ß structure are aligned at 90° to each other, 
the twists instead of fitting together oppose one another, 
and one pair of diagonal corners is closer together than 
the other pair of diagonal corners (Figure 6-33).”** It is 
usually at the close corners that the polypeptide con- 
nects one sheet to the other.”® The interdigitations of 
the amino acids in the three layered orthogonal ß sheets 
of penicillopepsin illustrates the packing observed 
between orthogonal £ sheets (Figure 6-34). 
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Figure 6-31: Distribution of side chains on one side of a $ sheet.” 


The representation is that of a four-stranded antiparallel D sheet in 
the crystallographic molecular model of the light constant domain 
from yglobulin (A) New. The four strands of polypeptide of the 
B sheet in the crystallographic molecular model were placed in the 
plane of the page, and the a carbons were connected with line seg- 
ments. The acarbon of each amino acid the side chain of which 
protruded below the plane of the page was marked with a dark dot. 
The side chains protruding below the page from these positions 
were displayed in space-filling format, and their projections on the 
plane of the page were traced. Each projection is labeled with the 
name of the particular amino acid. The amino acids in one column, 
valine, isoleucine, alanine, and proline and threonine, were 
assigned letters determined by the order in which their strands 
occur in the amino acid sequence. Prolined and Threonine d, 
which are adjacent in the amino acid sequence, were treated as the 
same amino acid to retain the pattern. The other amino acids were 
numbered in the direction in which the particular strand ran across 
the page. The view is from above and of the amino acids hanging 
down from the sheet. Lines indicate a lattice on which the 
a carbons of these amino acids lie. Reprinted with permission from 
ref 233. Copyright 1981 Academic Press. 


Regardless of whether the angle Q between £ sheets 
is-30° or 90°, the side chains between the two sheets are 
well buried, are usually hydrophobic (Figures 6-32 and 
6-34), and fit together tightly with a minimum of empty 
space. An interesting exception to these rules is found in 
a family of small proteins that use the interior of an 
orthogonal sandwich of two £ sheets to bind a fatty acid. 
In such proteins, space is made within the interface into 
which fits the hydrophobically compatible fatty acid.” 

The packing in the center of a B barrel (Figure 6-11) 
can be considered to be a special case of the packing of 
p sheets. The alternate amino acids along each f strand 
that enter the core of the barrel occur in layers roughly 
perpendicular to the axis of the hyperboloid.” In a 
Bbarrel of eight strands, the four members of each of 
these layers come from alternate strands (Isoleucine 51, 
Leucine 112, Leucine 161, and Isoleucine 212; Serine 85, 
Leucine 135, Glycine 182, and Leucine 234; and 
Glutamate 53, Lysine 114, Glutamate 163, and Glutamate 
214 in Figure 6-11). Within each layer of this cylindrical 
cake, the four side chains pack against each other and 
each successive layer packs against the layer below it (the 
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Figure 6-32: Packing of the amino acids at the interface between two opposed £ sheets in the crystallographic molecular model of prealbu- 
. 235 e s . 4 * P Be? . » . 
min.” A tracing of the æ carbons of the strands in this structure is presented in Figure 6-24E. This is represented diagrammatically in the 
upper inset, where the locations of the horizontal sections through the structure are designated by their positions in nanometers relative to 
the central section. The sections are normal to the two £ sheets, and the strands run approximately normal to the planes of the sections. The 
amino acids in the upper sheet (in which the four strands run parallel, antiparallel, parallel, antiparallel) are enclosed in solid lines; those in 
the lower sheet (in which the four strands run parallel, antiparallel, antiparallel, parallel) are enclosed in broken lines. The straight lines indi- 
cate the orientation of the interface, which twists in a right-handed sense as the sections proceed through the structure. Reprinted with per- 


mission from ref 235. Copyright 1981 National Academy of Sciences. 


layer of 85, 135, 182, and 234 is sandwiched between that 
of 51, 112, 161, and 212 and that of 53, 114, 163, and 214 
in Figure 6-11). In the top layer of the p barrel in Figure 
6-11, the hydrophobic portions of the side chains pack 
against the layer below, and the four hydrophilic atoms, 
the nitrogen and the three oxygens, are pointed upward 
out of the end of the barrel. 

In Bbarrels of five or six strands, the radii of the 
hyperboloids are small enough that the cylinder can 
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remain circular while the side chains tightly fill the central 
cavity, but in p barrels of eight strands, the hyperboloid is 
usually flattened into an ellipse to pack the side chains in 
each layer as tightly together as possible.” In ribonucleo- 
side-diphosphate reductase from E coli, a ß sheet of five 
parallel strands antiparallel to a second ß sheet of five par- 
allel strands together form a £ barrel of ten strands, but it 
is flattened so that the two respective sheets are opposed 
to each other across the minor axis of the ellipse.” This 
structure is drifting in the direction of a sandwich of two 
p sheets at an angle Q equal to 90°. Again, an interesting 
set of exceptions is that of p barrels in which a cavity is 


Figure 6-33: Abstract representation of the orientation of two orthog- 
onal sheets packed against each other that incorporates the left- 
handed twist usually associated with such structures.” The front view 
illustrates the orthogonal disposition of the strands of the two respec- 
tive sheets. The top view illustrates the twist, the fact that the strands 
can join at the two corners (A and B) that are brought together by the 
separate twists, and the fact that two of the corners are splayed by the 
twist. The view of the opened structure demonstrates how the stack 
can be produced by folding over two coplanar parallel sheets of 
p structure to produce an arrangement joined at the two corners by 
two continuous strands. A typical example of such a close corner 
occurs in murine adipocyte lipid-binding protein.” The bottom two 
sheets of ß structure in the three-layered sandwich of penicillopepsin 
(Figure 6-34) also have this arrangement. They are connected at two 
diagonal corners and splayed at the other two. Reprinted with per- 
mission from ref 238. Copyright 1982 American Chemical Society. 
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Figure 6-34: Packing of amino acids at the interface between three alternately orthogonal sheets of ß structure in the crystallographic molec- 
ular model of penicillopepsin.” In the center of the figure, the o carbon atoms of the molecular model between amino acids 16 and 123 are 
connected by line segments to provide a tracing of the polypeptide. The three-layered sandwich is viewed from above and consists of a three- 
stranded sheet of $ structure (parallel, antiparallel, parallel) on top of a four-stranded sheet of structure (parallel, parallel, antiparallel, 
antiparallel) on top of a second four-stranded sheet of p structure (parallel, antiparallel, parallel, antiparallel). The indicated sections 0.4 nm 
apart were cut through the three-layered sandwich in a space-filling representation of the molecular model. The planes of the sections were 
horizontal and normal to the page as indicated. The packing between the sheets can be viewed in respective cross sections arrayed in coun- 
terclockwise order from top to bottom. Amino acids are numbered, and all amino acids in a given pleated sheet are in either solid outline or 
broken outline. Amino acids in hatched outline are not in the sheets of structure. In the cross sections, the strands of the sheets at the top 
and the bottom run parallel to the page while the sheet in the middle runs perpendicular to the page. Reprinted with permission from ref 


238. Copyright 1982 American Chemical Society. 


required for the function of the protein, such as the cavity 
in the middle of the £ barrel of retinol-binding protein in 
which the retinol is bound.””' In this ß barrel, the ligand 
provides enough extra hydrophobic mass that the barrel 
can remain circular even though it has eight strands. The 
B barrel of red fluorescent protein from Discosoma is cir- 
cular even though it is composed of 11 strands because 
there is an o helix running through its center.” 

An æ helix lying upon a sheet of p structure usually 
has its axis almost parallel to the strands of the sheet 
because the a helix is straight, the sheet is twisted, and a 
straight rod can contact a twisted surface only when it is 
either parallel or perpendicular to the axis of the twist 
(Figure 6-25).” The angleQ observed”” between 
ahelices and adjacent sheets of structure is usually 
around 0°, and almost all values fall between -20° and 
+10°. The exceptions are usually instances in which the 
angle Q is close to 90°.“**° In one of these instances, a 
long £ sheet of four strands wraps around an o helix” as 
one’s four fingers would wrap around a cylindrical rod 
3-4 cm in diameter. This grip is yet another example of 
the elasticity of p structure. 


The interface (Figure 6-35)°’ between three of the 


æ helices and one of the £ sheets in lactate dehydroge- 
nase illustrates the fit between an ahelix and a twisted 
Bsheet in a parallel orientation. Note that the side 
chains from the æ helices lie upon the gaps between the 
side chains in the sheet of p structure. Because a sheet 
of structure twists appropriately, the œ helices lying 
across its surface parallel to its strands are aligned next 
to each other with anglesQ of about -40° between 
adjacent pairs even when they cleave tightly to the sur- 
face of the sheet.” This value for angle Q is sufficiently 
close to the -50° that produces the most frequently 
observed type of interdigitation between two a helices. 
Therefore, both the interfaces between the œ helices 
and the sheet of £ structure and the interfaces among 
the ahelices themselves can exist simultaneously in 
almost optimum orientations. It is also possible, how- 
ever, that the twist of such a sheet of ß structure arises 
from the requirement that the o helices upon it be posi- 
tioned at the proper angles to maximize the interdigita- 
tion of their amino acids. For all of these reasons, a 
twisted Bsheet sandwiched between two layers of 
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fold 
over 


Figure 6-35: Schematic formation of the interface between three œ helices and a sheet of structure found in the crystallographic molecu- 
lar model of ı-lactate dehydrogenase.” The sheet of £ structure is presented in the bottom left of the figure with its strands running verti- 
cally in the y direction and the axis of the right-handed twist parallel to the x-axis. Side chains of the amino acids on the upper face of the 
sheet are identified and numbered by the amino acid sequence of the protein. The three o helices that will form the interface with the sheet 
of structure are presented in the upper left of the figure with the face that will participate in the final interface directed upward. The axes 
of the three helices (œG, œB, and aC) are almost parallel to the vertical y-axis. Side chains that will participate in the interface are identified 
and numbered. When the three o helices are rotated 180° around the x-axis and placed upon the sheet of £ structure as they are in the molec- 
ular model, the interface is produced by the interdigitation of the highlighted amino acids from the sheet and the highlighted amino acids 
from three o helices, respectively. It is these interdigitations that position the three a helices upon the sheet of £ structure. Adapted with per- 


mission from ref 243. Copyright 1980 Academic Press. 


parallel a@helices is one of the most common tertiary 
structures. 

This arrangement is also the one assumed by the 
coating on a f barrel. The most common type of $ barrel 
is one in which all of the strands run consecutively and in 
parallel (Figure 6-11), and usually within the polypeptide 
connecting the amino-terminal end of one strand to the 
carboxy-terminal end of the next (for example, the con- 
nection between Cysteine 54 and Alanine 83 in Figure 
6-11) there will be an a helix running along the outer sur- 
face of the f barrel parallel to the ß strands of the ß bar- 
rel. Consequently, the strands occur consecutively 
around the barrel, and the underlying repeating pattern 


is Bstrand, œ helix, p strand, æ helix and so forth with 
extraneous segments of other secondary structure 
thrown in at random. Even in the hybrid f barrel in 
ribonucleoside-diphosphate reductase, the five parallel 
p strands in each of the two antiparallel $ sheets forming 
the barrel occur consecutively and are each connected to 
the next by a segment of polypeptide containing an 
a helix running across the outside of the barrel.” There 
is a peculiar variant of this a-helically wrapped D barrel 
in which each of the strands of p structure in the central 
barrel is replaced by an o helix to form an a-helically 
wrapped a-helical barrel.” 

It is useful to imagine the interior of a molecule of 


protein created by all of these arrangements as a three- 
dimensional jigsaw puzzle because this is an image that 
emphasizes the interdigitations among the side chains of 
the amino acids driving the various orientations of the 
secondary structures. The pieces of this puzzle, however, 
are neither inelastic nor invariant,” and there can be 
flaws in its mosaic. 

The elasticity of the packing of the amino acids in 
the interior of a protein is most readily demonstrated by 
performing site-directed mutation. When Alanine 129 in 
lysozyme from bacteriophage T4 is replaced with 
leucine, the stability of the protein decreases” by 6 kJ 
mol”, but its structure is affected only in the vicinity of 
the mutation. There it expands, most notably at Leucine 
121, in response to the increase in the size of the side 
chain at position 129 (Figure 6-36). When Valine 30, 
located between two ßsheets in the core of human 
transthyretin is replaced with methionine, the $ sheets 
move apart by 0.1 nm to accommodate the consequent 
steric effect.”' The usual response to mutations such as 
these that increase the volume of matter in the 
hydrophobic core of a protein is that the structure 
expands in response to the local increases in volume and 
the stability of the protein decreases,”” occasionally cat- 
astrophically.”® The strain of the increase in size can also 
be accommodated by a conformational change of an 
adjacent side chain to a significantly different rotamer.*” 

There is never only one invariant arrangement of 
side chains that can solve the problem of filling the space 
between segments of secondary structure in the 
hydrophobic core of a protein. For example, in the plas- 
tocyanin from Enteromorpha prolifera the surface of one 
B sheet uses Isoleucine 19, Isoleucine 96, and Valine 82 to 
conform to the surface of a neighboring £ sheet that is 
formed from Valine 3, Phenylalanine 29, and Isoleucine 
39, while the plastocyanin from Populus nigra uses 
Phenylalanine 19, Valine 96, and Phenylalanine 82 to 
conform to the same surface composed of Valine 3, 
Phenylalanine 29, and Isoleucine 39." Leucines 84, 91, 
99, 118, 121, and 133 and Phenylalanine 153 of lysozyme 
from bacteriophage T4 can all be replaced simultane- 
ously with methionines, and although the mutant is 21 kJ 
mol’ less stable than the wild type, it has the same over- 
all structure with the exception that the hydrophobic 
cluster formed by these seven side chains of almost equal 
volume to the cluster that it replaced is now a different 
puzzle, just as compact as the first.”“° 

Many proteins in their native states have cavities of 
various sizes* in their hydrophobic cores that are flaws in 


* The calculation of the volume and extent of cavities inside crys- 
tallographic molecular models is not straightforward.” The 
volume accessible to a spherical probe is usually calculated, and 
the radius chosen for the probe has a dramatic effect on what is 
regarded as a cavity, what is regarded as its volume, and what is 
regarded as its dimensions because the smaller the probe, the more 
easily it slips between the atoms. 
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the mosaic of the puzzle,’ some large enough to bind 
random hydrophobic ligands.” When larger amino 
acids are replaced by smaller ones through a site- 
directed mutation, an unnatural cavity is formed, and the 
contraction of the structure surrounding the artificially 
created cavity 6 again illustrates the elasticity of 
the puzzle. Although this contraction is usually incom- 
plete, leaving a definite cavity where one was not present 
before, when Isoleucine 29 in lysozyme from bacterio- 


Figure 6-36: Elastic expansion within the hydrophobic 


core of the crystallographic molecular models (Bragg spac- 
ing > 0.19 nm) of lysozyme from bacteriophage T4.7°°” 


Alanine 29 in the wild-type protein was replaced with a 
leucine by site-directed mutation. When the crystallo- 
graphic molecular model of the portion of the mutant pro- 
tein surrounding the mutation (black bonds) is superposed 
on the crystallographic molecular model of the same por- 
of Leucine 121, but Leucine 133 also moves outward as well 
as the entire a helix containing Leucine 129 and Leucine 


133. Phenylalanine 114 is forced to assume a less advanta- 
geous rotamer (see 6-5 and Figure 6-21). This drawing was 


tion of the wild-type protein (white bonds), the details of 
the expansion of the molecule in the vicinity of the muta- 
tion are readily observed. The greatest displacement is that 
produced with MolScript.°” 
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phage T4 is replaced with alanine, the structure sur- 
rounding the site of the mutation collapses to such a 
degree that no discernible cavity remains.” When an 
unnatural cavity is formed by site-directed mutation, the 
stability of the protein usually decreases.”°- It has been 
proposed that this instability produced upon the muta- 
tion of a larger amino acid to a smaller suggests that nat- 
urally occurring cavities in proteins destabilize their 
structure, !?” but it is not possible to extrapolate from the 
effects resulting from artificial changes performed by 
site-directed mutation to the effects of changes pro- 
duced by natural selection. 

It has been argued that because the ability of a par- 
ticular polypeptide to form a particular tertiary structure 
is not drastically affected by extensive replacement of 
amino acids in its core by site-directed mutation*”®” and 
because the volume in the interior of a molecule of pro- 
tein can be filled with a number of different arrangements 
of the normally available hydrophobic side chains,”® the 
packing of the amino acids cannot dictate the tertiary 
structure that results when the polypeptide folds. Such 
arguments, however, ignore the fact that it is the overall 
pattern in which the side chains emerge from the sec- 
ondary structures, not the identity of those side chains, 
that dictates the values of the angles Q and hence the ter- 
tiary structure. The fact that the details of the packing 
beyond these dictations display such tolerance is not 
remarkable because it has long been known that evolu- 
tion by natural selection frequently performs similar 
replacements. From a consideration of the logic of the 
interdigitations that are observed in naturally occurring 
proteins, it can be concluded that it is such interactions 
among the side chains that produce the relative orienta- 
tions assumed by the secondary structures and that these 
orientations are crucial to creating the tertiary structure 
of a protein. 
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Problem 6-4: The following is a segment of amino acid 
sequence from the coiled coil region of human epider- 
mal keratin:?** 


TAAENEFVTLKKDVDAAYMNK 
VELQAKADTLTDEINFLRALY 
DAELSOQOMOT 


(A) Write out this sequence in the same format? as 
the following diagram of a portion of the 
amino acid sequence from the coiled coil of 
a-tropomyosin: 


Asp Lys Asp Glu Glu Lys Ala 
Gln Ser Gly Lys Asp Leu Asp 
Glu Lys Glu Ala Lys Lys Asp 
Leu Leu Thr Tyr Ala Ala Ala 
Leu _ Leu Leu Leu Leu _ Ala Val 
Glu Gln Glu Ser Gln Glu Glu 
Val Lys Asp Lys Glu Thr Ala 
Asp Lys Asp Glu Glu Lys Ala 


In your diagram place the appropriate amino acids from 
the sequence of human epidermal keratin along the 
center line as was done in the diagram of the sequence of 
a-tropomyosin. 


(B) What is the role of the amino acids placed along 
the center line? 


(C) Circle the two amino acids in your diagram that 
do not seem to fit this role. 


(D) How may they be excused? 


Problem 6-5: In the coiled coil of ahelices shown in 
Figure 6-29, why do Lysine 263 and Glutamate 268 and 
Lysine 275 and Glutamate 270 form hydrogen bonds? 


Problem 6-6: The drawings on the next page of three 
crystallographic molecular models’ *” illustrate 
aspects of the packing between segments of secondary 
structure. These drawings were produced with 
MolScript.” Discuss each molecular model separately 
and describe the points illustrated by each in turn. 


Water 


About 40-70% of the volume ofa crystal of protein is occu- 
pied by water.’ It fills the large vacant spaces among the 
folded polypeptides. The majority of the molecules of 
water in a crystal of protein are liquid and disordered 
over the time required to collect a data set. Regardless of 
the degree of refinement or the minimum Bragg spacing, 
the regions containing this disordered water remain fea- 
tureless and have a mean electron density similar to that 
of liquid water.” These regions of the unit cell are treated 
as solids of uniform electron density that have the shape 
of the disordered regions peculiar to the particular crys- 
tal. These irregular solids can be used for refining phases 
by solvent flattening and are always incorporated as such 
into the molecular model of the unit cell from the first 
cycle of the refinement because when these water-filled 
lacunas are added explicitly to the molecular model, the 
R-factor decreases significantly.” 

During a refinement, maps of difference electron 
density are frequently calculated. In addition to the large 
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spaces filled with disordered water, small discrete peaks 
of positive electron density become regularly recurring 
features of these maps. Because no reasonable 
rearrangement of the atoms of the molecular model of 
the protein is able to erase these features and because 
they are unaccompanied by adjacent peaks of negative 
electron density indicative of a misalignment of the 
molecular model, these peaks are assumed to represent 
either individual molecules of water or individual mole- 
cules of solutes from the solution in which the crystals 
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were formed, which is usually a concentrated solution of 
ammonium sulfate, poly(ethylene glycol), or a smaller 
glycol. Sulfate is an ion with a large number of electrons, 
and any peaks of electron density representing sulfate 
can usually be recognized with little difficulty.°° 
Molecules of glycols or other polyols are also easily rec- 
ognized. The ammonium cation, however, is indistin- 
guishable in its electron density from a molecule of 
water, but proteins at neutral pH rarely bind many 
cations so it is usually assumed that the smaller isolated 
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peaks of electron density are locations at which mole- 
cules of water are situated. 

A location for a molecule of water in the crystallo- 
graphic molecular model of a protein is a positive peak of 
electron density (Figure 6-37)°” that persists in its loca- 
tion in maps of difference electron density as the refine- 
ment progresses and that has a magnitude 0.2-1.0 times 
the magnitude expected for a stationary molecule of 
water. The reason for the generous range is that the mag- 
nitude of the peak of electron density arising from a more 
or less fixed molecule of water decays to less than 20% that 
of a stationary molecule of water when its vibrational 
amplitude is only 0.06 nm.” For example, the magni- 
tudes of the peaks of electron density in Figure 6-37B vary 
significantly even though these locations are heavily 
chelated. The sometimes vague peaks of electron density 
present in a difference map at the moment when the deci- 
sion is made by the crystallographer that they represent 
molecules of water should be distinguished from the more 
solid peaks of electron density that appear at the same 
positions in the map of electron density calculated with F, 
and o, after the contributions of these molecules of water 
have been included in o. These more solid peaks are only 
there because waters have been added to the model, not 
because the peaks have become more well defined. There 
is an additional drawback of this strategy for identifying 
fixed locations occupied by molecules of water. Such a 
peak of positive electron density sometimes is the result 
of the alternative conformation of a side chain, and what 
was assumed to be water at low resolution turns out to be 
protein atoms from minor conformations at higher reso- 
lution. 

A portion of the peaks of electron density assigned 


as representing molecules of water in a map are observed 
in the same locations in maps of electron density for the 
same protein in different crystals or for the same protein 
from a different species, and such locations are consid- 
ered to be conserved.”” It is assumed that a conserved 
location is occupied consistently, in particular, when the 
protein is in solution rather than in the crystal. Molecules 
of water in the map that are not conserved are assumed 
to be peculiar to that protein in that crystal, and such 
locations may or may not be occupied to a significant 
extent when the protein is free in solution. 

There are different degrees of conservation. Peaks 
representing molecules of water may be found at the 
same locations in two different molecules of the protein 
in the same unit cell. For example, 25 positions in thiore- 
doxin from E. coli,» 46 positions in cytochrome De 
from E coli,” and 26 positions in ß-lactamase from 
E coli’ are occupied by molecules of water in both of 
the crystallographic molecular models of the respective 
protein in the same unit cell. Peaks representing mole- 
cules of water may be found at the same locations in 
the same molecule of protein in different crystals. For 
example, 30 positions for molecules of water in ribonu- 
clease T, are conserved in four different crystals of the 
protein.*” They may be found at the same locations in 
crystallographic molecular models of the same protein 
from different species. For example, a string of five mol- 
ecules of water is found at the same locations in the inte- 
riors of crystallographic molecular models of both 
cytochrome f from Phormidium laminosum and 
cytochrome f from Brassica rapa.’'° They may even be 
found at the same locations in different but related pro- 
teins. For example, two positions for molecules of water 
are found at the same locations in crystallographic 
molecular models of ferredoxin-NADP* reductase, 
phthalate-dioxygenase reductase and a fragment of 
nitrate reductase (NADH).2”’ 

The molecules of ordered water included in the 
final refined molecular model surround the molecule of 
protein, fill deep clefts in its surface, and are found in its 
interior (Figure 6-38).°° They represent locations in the 
actual molecule of protein in the crystal that are consis- 
tently occupied by a molecule of water. Each location is 
fixed and static because it is observed in the map of elec- 
tron density, but the actual molecules of water at those 
locations on the surface of the molecule of a protein 
change sites and exchange with molecules of water in the 
disordered regions of the unit cell as rapidly as mole- 
cules of water change positions in the hydrogen-bonded 
lattice of liquid water itself. A lower limit of 1 ns” for the 
rate constant for the exchange of molecules of water at 
such locations on the surface has been established 
experimentally,”” and the rate constant for their rotation 
has been observed?” to be about 50 ns”. The latter rate 
constant is indistinguishable from that for a molecule of 
water in the bulk phase. It is the locations for the mole- 
cules of water that remain fixed relative to the molecule 
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of protein, not the molecules of water themselves. There 
are also locations for molecules of water on flexible por- 
tions of the protein that change their conformations so 
widely that the locations for those molecules of water 
cannot appear in the map of electron density. 

One of the more unexpected observations has been 
the discovery of molecules of water buried in the inte- 
rior of proteins with no direct contact with the solvent. 
These occur as single forlorn molecules (Figures 6-37A 
and 6-39);*”* or as small clusters of two or more mole- 


spacing > 0.18nm) of penicillopepsin.” At 


various cycles of the refinement of this crystallo- 
graphic molecular model, it was decided that 
so positioned are designated in the figure with 
open circles (oxygen atoms) relative to the 
The drawing of the crystallographic molecular 


model is presented in the same orientation as 
that in Figure 4-17. This drawing was produced 


polypeptide backbone without the side chains. 
with MolScript.°” 


Figure 6-38: Locations for molecules of water in 
the crystallographic molecular model (Bragg 
certain members of the array of as yet un- 
assigned peaks of positive density in the map of 
difference electron density were locations for 
molecules of water, and a molecule of water was 
placed at each of these positions in the model. 
The 319 unique locations for molecules of water 
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cules (Figure 6-37B)*”” surrounded by donors and accep- 
tors of hydrogen bonds from the protein itself. For exam- 
ple, in the crystallographic molecular model of equine 
hepatic alcohol dehydrogenase (n,a = 374), 12 molecules 
of water making no contact with the solvent have been 
located in the interior of the protein,” three as a triplet, 
two as a doublet, and seven as singlets; in the model of 
a-lytic endopeptidase (na = 198), nine molecules of 
water making no contact with the solvent have been 
located, three as a triplet, four as two doublets, and two 


as singlets;” in the model of fatty-acid-binding protein 
from Manduca sexta (n,a = 131), nine molecules of water 
making no contact with solvent have been located, four 
as a quartet, two as a doublet, and three as singlets;*” 
and a string of five molecules of water making no contact 
with solvent have been located in the model of 
cytochrome f from Phormidium laminosum.” A pair of 
water molecules is located in the center of the p barrel in 
the middle of the crystallographic molecular model of 
chitinase B from Serratia marcescens, deeply buried in 
the core of the protein.” 

Even such locations occupied by molecules of 
water buried within the structure of the protein exchange 
rapidly with water in the bulk phase, presumably as the 
result of breathing movements in the folded polypeptide. 
For example, a molecule of water at one of the buried 
locations in bovine pancreatic trypsin inhibitor 
exchanges with water in the bulk phase at a rate con- 
stant” of 6 ms. 

Molecules of water are also buried in cracks and fis- 
sures connected to the solvent. For example, in penicil- 
lopepsin a finger of five water molecules extends from the 
surface into the interior (Figure 6-40A).”® In the crystallo- 
graphic molecular model of unoccupied chlorampheni- 
col O-acetyltransferase, there is a channel 2.5 nm in 
length filled with a continuous chain of water molecules 
extending from one side of the molecule of protein to the 
other.” A more common situation, however, is for a fairly 
broad fissure to be filled with an extended cluster of water 
(Figure 6-40B) 28587 

Some proteins have sizeable cavities in their interi- 
ors. Although there is electron density in the cavity in the 
crystallographic molecular model of human interleukin 
1B, it is featureless. Nuclear Overhauser effects in 
nuclear magnetic resonance spectra on the absorptions of 
protons in the side chains lining this cavity demonstrate 
that it contains molecules of water.” Because the elec- 
tron density is featureless, the two or three molecules of 
water in the cavity must be mobile, which would make 
sense because it is lined with the side chains of valines, 
leucines, and phenylalanines that provide no donors or 
acceptors to pin the molecules of water. The 22 molecules 
of water in the large internal cavity of unoccupied rat fatty- 
acid-binding protein, however, are all represented by 
peaks of electron density in the map and together form a 
large hydrogen-bonded cluster pinned by the donors and 
acceptors of tyrosines and glutamates lining the cavity.” 

Often, a particular role can be assigned to a mole- 
cule of water observed in a map of electron density. For 
example, in the crystallographic molecular model of the 
triacylglycerol lipase from Geotrichum candidum, 17 of 
the molecules of water at fixed positions donate hydro- 
gen bonds to the vacant acyl oxygens at the carboxy-ter- 
minal ends of ahelices (Figure 6-6), and 14 of the 
molecules of water at fixed positions accept hydrogen 
bonds from the vacant nitrogen-hydrogens at the 
amino-terminal ends of «helices.’” In the crystallo- 
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graphic molecular model of cholesterol oxidase from 
Brevibacterium sterolicum, a number of water molecules 


are involved in linking one segment of $ structure to 


another.” 


Usually, however, the molecules of water at 


fixed positions are chelated at random by the donors and 


acceptors of the protein. 


The ordered water found covering the open surface 
of a crystallographic molecular model (Figure 6-38) is 


held in its discrete locations by 


hydrogen bonds to 


donors and acceptors on the surface. This pinning is 
accomplished by particular asparagines, glutamines, 


lysines, aspartates, arginines, and glutamates distributed 


over the surface of the individual molecules of protein. 
These donors and acceptors hold the molecules of water 


bonded networks.” It is the 


donors and acceptors on the protein that anchor these 
networks in the unit cell. At their peripheries the net- 
works contain molecules of water attached only to other 


molecules of water 


in extended, fixed hydrogen- 


and at the far edges the networks 


3 


fade into the disorder of the bulk solvent in the crystal. 
The number of water molecules included as individuals 
in the refined molecular model is probably a function 
more of the minimum Bragg spacing of the data set, the 
peculiarities of the refinement, and the subjective deci- 


sions of the crystallographer than of any discrete distinc- 


tion between ordered and disordered water in the actual 


crystal, if any such distinction can ever be made. 


Networks resembling clathrates are rarely”??? 


found around hydrophobic functional groups on the sur- 


face of crystallographic molecular models,”’ and those 


networks that are found usually do not seem to be the 
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result of the fact that they surround a hydrophobic side 
chain." If such networks around hydrophobic amino 
acids are present in the crystal but are not pinned to the 
same locations in each unit cell or if they are rearranging 
continuously while the data set is being gathered, they 
would not be seen in the crystallographic molecular 
model. If the charged functional groups on the surface of 
a protein are surrounded by spheres or semispheres of 
hydration, as the paradigm associated with the hydration 
of spherical ions suggests (Figure 5-9), the molecules of 
water in these shells of hydration are not pinned, 
because no indication of their presence is seen in the 
maps of difference electron density. Charged functional 
groups on the side chains of the amino acids, however, 
often have one or two molecules of water forming hydro- 
gen bonds to their donors and acceptors. All of these 
points reemphasize the fact that only specific locations 
occupied by molecules of water for long periods of time 
appear as distinct features in maps of electron density. 

Of the water molecules that are attached directly to 
the molecule of protein in a refined crystallographic 
molecular model, about two-thirds make only one 
hydrogen bond to the protein”? and one-third make 
two or more hydrogen bonds. The mean number of 
hydrogen bonds between molecules of water in this first 
layer of hydration and the protein is 1.7.°° The average 
distance between one of these waters and a donor nitro- 
gen is 0.29 + 0.02 nm and between one of these waters 
and an acceptor oxygen is 0.29 + 0.02 nm.” Of these 
water molecules directly bound to protein, 42% act as 
donors to acyl oxygens of the polypeptide backbone, 16% 
act as acceptors from nitrogen-hydrogen bonds of the 
polypeptide backbone, and 42% are in hydrogen bonds 
to donors and acceptors on the side chains.®”®?® In the 
crystallographic molecular model of lysozyme,” of the 
waters bound to functional groups of side chains on the 
surface of the protein, 24% were donors to carboxylates, 
13% were donors to primary amides, 13% were acceptors 
for primary amides, 14% were acceptors for primary alkyl 
ammoniums, 14% were acceptors for guanidiniums, 14% 
were hydrogen-bonded to alkyl hydroxyls, and 6% were 
hydrogen-bonded to phenolic hydroxyls. 

The disordered side chains in a crystallographic 
molecular model are usually at its surface in the most 
accessible locations. Because they are at the surface, they 
are usually hydrophilic and are probably even more 
hydrated than the ordered side chains. Any molecules of 
water bound to these disordered side chains are never 
seen in the crystallographic molecular model. Therefore, 
the mean number of waters bound to each side chain is 
probably an underestimate of the actual values. 
Nevertheless, of the side chains to which bound water 
can be assigned in lysozyme,”™ aspartic acids have a 
mean of 2.0 waters; lysines, a mean of 1.8 waters; 
asparagines, a mean of 1.6 waters; glutamic acids, a 
mean of 1.5 waters; threonines, a mean of 1.2 waters; 
tyrosines, a mean of 1.0 water; serines, a mean of 0.7 


water; and glutamines, a mean of 0.7 water bound to 
their side chains. More than 50% of the arginines are dis- 
ordered, but those that can be observed have 1.5 waters 
bound. 

When a set of 16 crystallographic molecular models 
from data sets gathered to Bragg spacings of 0.17 nm or less 
were examined,” side chains that had two or three het- 
eroatoms that can participate in hydrogen bonds (aspar- 
tate, arginine, glutamate, histidine, and glutamine) 
frequently (>70%)”” had one or more waters bound to 
them, while those with only one (tyrosine, tryptophan, 
threonine, and lysine) less frequently had water bound to 
them (60-70%). Asparagine (61%) and serine (51%) fall out 
of these ranges, probably because they are often hydrogen- 
bonded to the backbone. 

It is the molecules of water hydrogen-bonded to the 
donors and acceptors of these side chains that produce 
the sharp maximum at around 0.29 nm in the solvent 
distribution function around a molecule of protein 
The hydrophobic side chains (methionine, alanine, 
phenylalanine, isoleucine, leucine, and valine) much less 
frequently (10-30%) have fixed locations for molecules of 
water adjacent to them, but these side chains are usually 
buried in the interior of the protein. Those hydrophobic 
groups that do have fixed locations for molecules of 
water adjacent to them are surrounded by a layer of 
water in which the centers of the oxygen atoms are about 
0.4 nm from the centers of the carbon atoms of the side 
chains,**° as expected from their van der Waals radii 
(Table 6-3). 

There are a number of physical measurements 
which register the fact that each molecule of protein in 
solution has water bound to it in an irregular network cre- 
ating a shell of hydration. The molecules of water in this 
shell of hydration differ from the molecules of water in 
the bulk of the solution away from the molecule of pro- 
tein in several of their physical properties. For example, 
neutron scattering has revealed that the layer of hydra- 
tion immediately adjacent to the surface of a molecule of 
protein has a density about 10% greater than that of the 
bulk water Z From a dissection of the compressibility of 
this layer of hydration, it has been concluded that it 
contains extensive hydrogen-bonded networks’ similar 
to those observed in crystallographic molecular 
models.**°?*! While it is unable to distinguish the water in 
this layer from water in bulk solution because its rate of 
relaxation is too fast, nuclear magnetic resonance is able 
to detect the buried waters in a molecule of protein 
because their rates of relaxation are so much slower.””® 

Each of the molecules of water surrounding a mol- 
ecule of protein at a given instant is in a different situa- 
tion (Figure 6-38), and the relationship between each 
one of these molecules of water and the molecule of pro- 
tein depends upon the respective situation. It is the con- 
tribution of each one of these molecules of water to the 
statistical behavior that produces the value of the physi- 
cal property measuring the hydration of the protein, Au 


[grams of H,O (gram of protein)". Each contribution 
will be a unique function of the situation of the respec- 
tive molecule of water, and the physical measurement 
will be only an average over all of these contributions. 
Mathematically, this heterogeneity in the situations of 
the waters participating in the shell of hydration can be 
expressed as a weighted mean:”” 


Moo 2 

2 

du ~ WV E (6-4) 
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where Mu o is the molar mass of water (18.0 g mol), M, 
is the molar mass ofthe protein, and the sum is over a set 
of statistical weights w;. 

There are two ways to think of the meaning of the 
statistical weights w;. It can be assumed that there are n 
sites for the binding of water molecules around the pro- 
tein, the positions of which move through the solution in 
lock step with the protein. The statistical weight w; for a 
given site is then the occupancy of that site, which is the 
fraction of the time that the site is occupied by a mole- 
cule of water.” It is also possible to consider all n mole- 
cules of water in the vicinity of the protein at a given 
instant. The statistical weight w; in this case expresses the 
degree of influence the molecule of protein has over the 
behavior of water molecule i. When w; = 1, the location 
occupied by water molecule i is fixed, as if covalently, to 
the molecule of protein. When w; = 0, the water mole- 
cule i is uninfluenced in its behavior by the presence of 
the molecule of protein. 

Under no circumstances should the layer of hydra- 
tion surrounding a molecule of protein be pictured as a 
uniform layer clearly distinguished from the water in the 
bulk of the solution by some discontinuous boundary. 
Rather, the layer of hydration gradually fades from fixed 
defined locations for molecules of water adjacent to the 
surface of the molecule of protein to molecules of water 
distant from the surface that are only marginally affected 
by its presence. 

It is almost always the case that physical measure- 
ments of hydration yield a simple number, Au o, the 
grams of water bound for every gram of protein (Table 
6-4). It is not surprising that this number varies with the 
method used to obtain it, as more or less of the molecules 
of water surrounding the protein differ more or less from 
the water in the bulk solvent in the particular behavior 
measured by the particular procedure. 

The self-diffusion of water decreases when protein 
is added to the solution,’ and this decrease can be 
explained if it is assumed that the water of hydration, 
being less mobile than the water in the bulk phase, does 
not participate significantly in self-diffusion. With this 
assumption, the amount of water bound to the protein 
can be calculated. 

The relative permittivity of a solution of protein 
decreases discontinuously when the frequency of the 
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alternating electric field used to measure that relative 
permittivity becomes greater than the ability of the mol- 
ecules of protein to reorient in response to its alter- 
ations,!! and another discontinuous decrease is 
observed when the frequency becomes greater than the 
ability of the water in the bulk solvent to reorient. 
Between these two extremes, there is a third dielectric 
relaxation that is assigned’ to the waters of hydration 
bound to the protein. These waters have dielectric relax- 
ations 10-100-fold slower than the waters in the bulk 
solution. From the spectrum of these dielectric relax- 
ations, the concentration of these relatively immobilized 
molecules of water and hence the amount of water 
bound to the protein can be calculated. These molecules 
of water, however, are not fixed to the protein, or they 
would be required to rotate with it, and their dielectric 
relaxation would be indistinguishable from that of the 
protein itself. 

Solid powders of dry protein always have water 
incorporated in them and the amount of this water of 
hydration can be chemically determined. A more sys- 
tematic approach is to equilibrate the dry powder, either 
as a precipitate, as a microcrystalline solid, or as visible 
crystals, with air of a certain relative humidity. It has 
been proposed that air at 90% relative humidity is the 
appropriate choice.’ Below this value the powders tend 
to become glasses,” and above this value they become 
hygroscopic. The amount of water bound by a solid 
powder of a given protein at 90% relative humidity can be 
taken as its hydration. 

If it is assumed that the water hydrating a protein is 
entirely unable to dissolve salting-out solutes that are 
otherwise freely soluble in water, the negative of the 
preferential solvation of a particular protein in a partic- 
ular solution (Equation 1-57) can be multiplied by the 
molarity of the water in that solution and the molar mass 
of water to obtain a value for the grams of H,O (gram of 
protein) in the layer of hydration. To perform measure- 
ments of preferential solvation, a solution of the protein 
is usually brought into equilibrium with a solution con- 
taining only water and the salting-out solute, for exam- 
ple, glucose, lactose,” or sucrose*” (Table 6-4). It is 
also possible to equilibrate crystals of a protein with solu- 
tions of salting-out solutes and from the dependence of 
the density of the crystal on the density of the solution to 
determine the amount of water in the crystal that 
excludes the solute.27937 

When a solution of protein is frozen, the water of 
hydration freezes below the freezing point of the water in 
the bulk solution. Not until the temperature is lowered to 
below 180 K does it all become frozen." For example, at 
-3 °C, 0.51 g of water (g of protein)’; at -5 °C, 0.46 g of 
water (g of protein)’; and at -7 °C, 0.41 g of water (g of 
protein)" remained unfrozen in a solution of ovalbu- 
min.” Unfrozen water is more mobile than frozen 
water, and the two can be distinguished by nuclear mag- 
netic resonance.*’’*”? The amount of unfrozen water in a 


Table 6-4: Hydration of Proteins“ 


excluded volume 


frictional coefficient 


scattering of 


protein self-diffusion? dielectric solid at NMR frozen X-radiation at 
of H,O”? relaxation?” RH = 90%°023% sugar” 305 (NH,)250,7 02206 solution?” diffusion?® viscosity>” small angles?” 
ribonuclease 0.35 0.18 0.27 
0.46 
lysozyme 0.25 0.45 <0.9 0.36 
myoglobin 0.25 0.42 0.46 <0.4 <0.6 
chymotrypsinogen 0.29 0.31 0.26 0.37 <0.5 0.18 
0.50 
y chymotrypsin 0.18 
a-lactalbumin 0.36 
B-lactoglobulin 0.32 0.56 0.29 <0.7 <0.6 0.24 
0.34 0.23 
ovalbumin 0.18 0.29 0.31 <0.5 
hemoglobin 0.37 0.31 0.45 <0.4 <0.7 
0.30 0.27 
0.13 
pepsin 0.24 
serum albumin 0.32 0.31 0.50 0.43 <l.l <0.8 0.15 
0.43 
0.27 
0.08 


“All units are grams of water (gram of protein)”. 
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frozen solution of protein at 238 K (-35 °C) has been des- 
ignated as water of hydration. 

Upper limits on the amount of water that migrates 
with a molecule of protein through the solution can be 
calculated from the frictional coefficient.” The radius 
of a hard sphere the same volume as a molecule of pro- 
tein can be calculated from its molar mass and partial 
specific volume, and the radius of the sphere that would 
have the same frictional coefficient as the molecule of 
protein can be calculated with Equation 1-66. The latter 
sphere is always larger than the former. If it is assumed 
that the entire difference in volume is water forced to 
move with the molecule of protein, an upper limit to the 
amount of bound water can be calculated (Table 6-4). It 
is an upper limit because molecules of protein are not 
spheres and a particle with the same volume as a given 
sphere but a different shape will always have a larger fric- 
tional coefficient than that sphere. How much of the dif- 
ference between the two radii is due to hydration and 
how much to differences in shape has never been ascer- 
tained unambiguously for any protein. The numbers tab- 
ulated are not intended to be estimates of hydration, but 
upper limits of the hydration. 

It is also possible to estimate the hydration of a pro- 
tein from the scattering at small angles of X-radiation 
from a solution of that protein as a function of the angle 
of that scattered radiation.” 

There are several remarkable features of this tabu- 
lation (Table 6-4). The values for bound water are all sim- 
ilar, and each technique produces values that, although 
they do not agree, are in the same range (0.2-0.4 g g”), 
which is about2 mol of water (mol of amino acid) '. There 
seems to be no significant difference in the amount of 
bound water for every gram of protein over a5-fold range 
in size of the proteins, between ribonuclease (n,a = 124) 
and serum albumin (n,a = 581). For a small protein such 
as lysozyme (Mna = 129) or dihydrofolate reductase 
(Naa = 162), these results indicate that there should be 
200-300 molecules of bound water for every molecule of 
protein. In a crystal of lysozyme, 140 molecules of water 
had locations that were sufficiently distinct to be incor- 
porated into the refined molecular model.” In a crystal 
of dihydrofolate reductase, 264 molecules of ordered 
water had sufficiently distinct locations to be incorpo- 
rated in the refined molecular model.” Whether these 
ordered molecules of water bear any relation to the bound 
water detected by the physical measurements is uncer- 
tain. 

There is a highly significant correlation between the 
accessible surface area of a crystallographic molecular 
model of a protein and the total number of amino acids 
it contains, regardless of whether it is a monomer or an 
oligomer.’ As a result of this correlation, the mean 
accessible surface area for each amino acid falls gradu- 
ally and monotonically from 0.53 nm? (amino acid)" 
when the protein contains 100 amino acids to 0.30 nm? 
(amino acid) when the protein contains 2000 amino 
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acids. From the definition of accessible surface area 
(6-12) it follows that a molecule of water, held by hydro- 
gen bonds at 0.28 nm from its nearest neighbors, can 
cover about 0.07 nm? of accessible surface area if waters 
are assumed to pack in hexagonal array or 0.09 nm? if 
they are in a tetrahedral lattice (Figure 5-2). This means 
that there are about 7 waters (amino acid)” immediately 
adjacent to the surface of a protein containing 100 amino 
acids and 4 waters (amino acid)" immediately adjacent 
to the surface of a protein containing 2000 amino acids. 
These limits would be equivalent to 1.2 and 0.7 g of water 
(e of protein), respectively. For the proteins gathered in 
Table 6—4, which all contain less than 600 amino acids, 
the span would be 1.2-0.9 g of water (g of protein). 
Therefore, the water of hydration determined by physical 
measurements is considerably less than the amount of 
water required to cover the surface of a molecule of pro- 
tein with a continuous rigidly fixed layer. 

Part of the reason for this discrepancy may be that, 
as with the networks of water covering the surface of a 
molecule of protein in a crystallographic molecular 
model, the layer of hydration is patchy and discontinu- 
ous“*°?! but the heterogeneity that must exist among 
the waters of hydration is probably most of the reason. 
Some waters at the surface are held tightly (w;= 1.0), but 
most are only loosely influenced by the protein (w;< 1) 
and contribute only partially to the weighted mean 
(Equation 6-4). Therefore it is not surprising that the 
weighted mean is less than the limit calculated by simply 
counting every immediately adjacent molecule of water 
and presuming it to be fixed to the protein. The range 
over which the amount of immediately adjacent mole- 
cules of water (0.9-1.2 g g’) varies among the proteins of 
the size of those contributing to Table 6-4 is narrow, and 
this fact explains why all of the proteins seem to have 
about the same degree of hydration, within the variation 
of the measurements. 

The molar concentration, and hence the thermody- 
namic activity, of the water in the bulk phase of a solu- 
tion of protein can be changed without changing its 
concentration in the layer of hydration by adding a salt- 
ing-out solute such as sucrose, triethylene glycol, diox- 
ane, stachyose, or poly(ethylene glycol) that is excluded 
from the layer of hydration." Because the water in 
the layer of hydration is in rapid equilibrium with the 
water in the bulk phase, changing the activity of the 
water in the bulk phase changes its activity in the layer of 
hydration, and this change affects any chemical reaction 
in which the amount of hydration of the protein changes. 
As one might expect, the binding of substrates to an 
enzyme,’'%*!® the binding of a protein to DNA,*” a large 
conformational change of a protein,*!’*"* or the binding 
of one protein to another causes significant changes in 
hydration. From the magnitude of the effect of changing 
the concentration of water on the dissociation constant 
for these reactions, the number of molecules of water 
removed from or added to the layer of hydration during 
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the reaction can be estimated. These range from 9 mole- 
cules of water for binding of a substrate to a hydrated 
active site"? to 60 molecules of water for a significant 
conformational change "7219 In the latter transforma- 
tion, a portion of the water detected as leaving the shell 
of hydration is thought to be molecules beyond the first 
layer. 


Suggested Reading 


Blake, C.C.F., Pulford, W.C.A., & Artymiuk, P.J. (1983) X-ray studies 
of water in crystals of lysozyme, J. Mol. Biol. 167, 693-723. 


Problem 6-7: Assign the hydrogens to donors and accep- 
tors in Figures 6-37B and 6-40A,B. 


Ionic Interactions 


Almost all of the charged amino acids—glutamate, 
aspartate, histidinium, lysinium, and arginine—are 
found on the surface of the crystallographic molecular 
model of a protein, so that they retain their hydration. 
Aside from the few that have roles as acids and bases in 
the function of the protein, the reason that these charged 
amino acids are present on the surface of a protein is to 
permit it to dissolve in water at high concentrations. For 
example, the concentration of hemoglobin in an erythro- 
cyte is 0.3 g mL". 

The distribution of these elementary charges on 
the surface of a molecule of protein seems to be random 
with little regard for the signs of the elementary charges 
and no attempt to compensate the charges. Patches of 
positive charge and patches of negative charge are as 
common as regions where the charges are evenly 
divided. Changing these distributions seems to have little 
effect on the stability of the protein.” The lysozyme 
from bacteriophage T4 has an excess of nine elementary 
positive charges over elementary negative charges at 
neutral pH. When lysines on its surface were changed to 
glutamates by site-directed mutation to produce a 
number of single, double, triple, and quadruple mutants 
in which the net charge number decreased from +9 to +7, 
+5, +3, and +1, respectively, the mean change in the free 
energy of folding of the protein was +3.3 + 2.9 kJ mot! 7 
Consequently, the protein decreased slightly in stability 
rather than increasing in stability as its excess charge was 
neutralized, and the magnitudes of the individual 
decreases in stability showed no correlation with the 
magnitude of the decrease in charge. Increases in stabil- 
ity of the same magnitude (-4 to -8 kJ mol), however, 
have been observed upon neutralizing imbalances of 
charge on the surfaces of ubiquitin?” and the subunit- 
binding domain of dihydrolipoyllysine-residue acetyl- 
transferase from Bacillus stearothermophilus. All of 
these experiments were performed at an ionic strength of 
0.05 M, so the differences in stability observed would 


probably have been even smaller at the physiological 
ionic strength of 0.15 M. 

There are experimental results suggesting that the 
charge of the amino acids on the surface of a protein may 
electrostatically increase the rate of association” or 
increase the equilibrium constant for association" of 
ligands that bear an opposite charge. These effects, how- 
ever, are rarely more than a factor of 2 (-1.7 kJ mol) at 
physiological ionic strengths, and often the ionic 
strength of the solution must be lowered to observe them 
at all,” so they can be of little consequence biologically. 

The acid dissociation constants of the individual 
acid-bases of the side chains on the surface of a protein 
are shifted by the elementary charges on the amino acids 
in their vicinity. For example, it has been shown that the 
pK, of Histidine 64 in subtilisin BPN’ decreases by 0.26 
unit when Aspartate 99 is mutated to a serine?” because 
when the elementary negative charge of Aspartate 99 is 
no longer in its vicinity, the stability of the histidinium 
ion decreases relative to that of the neutral histidine. All 
such pairwise interactions are tautomeric because if one 
side chain shifts the pK, of another, then the other side 
chain must shift the pK, of the first. If the negative charge 
of Aspartate 99 shifts the pK, of Histidine 64, the positive 
charge of Histidine 64 must shift the pK, of Aspartate 99. 
A large constellation of such tautomeric interactions 
determines the individual acid titrations of the side 
chains on the surface of a protein. 

The acid-base titration curve of a native protein 
(Figure 1-11) is the summation of the individual titra- 
tions of the accessible acid-bases on its surface. For 
every 100 amino acids, a normal protein contains about 
five aspartic acids, six glutamic acids, two histidines, 
three tyrosines, and six Iysines.”” The aspartic acids (pK, 
= 4.0) and glutamic acids (pK, = 4.4) account for most of 
the dissociation of protons between pH 2 and 5.5. The 
lysines (pK, = 10.4) and tyrosines (pK, = 9.8) account for 
most of the dissociation of protons between pH 8 and 12. 
These are the two major features of the titration curve of 
a protein because these four amino acids account for the 
majority (90%) of the acid-bases present in the protein. 
The histidines account for most of the small amount of 
dissociation that occurs between pH 5.5 and 8. 

As the pH is decreased below the isoelectric point, 
a protein gains net positive charge number as each 
proton associates, and as the pH is increased above the 
isoelectric point, the protein gains net negative charge 
number as each proton dissociates. This change of net 
charge number with decreases and increases of pH 
causes the addition of each successive proton or the 
removal of each successive proton, respectively, to be 
more difficult. The reason for this is that the gathering of 
net charge number on a molecule, even one as large as a 
protein, is an unfavorable reaction relative to dispersing 
those elementary charges evenly throughout the solu- 
tion. Because tautomeric interactions are themselves 
electrostatic, the effect of the resulting charge on the 


overall acid dissociation of the molecule of protein is 
simply the summation of all the tautomeric interactions 
among all the side chains, the individual titrations of 
which produce the complete titration curve. 

That the electrostatic work of creating this charge 
shifts the observed titration curve is easily demonstrated 
by changing the ionic strength (Figure 1-11). An increase 
in ionic strength shrinks the layer of counterions around 
each individual, charged amino acid in the protein 
(Equation 1-71) and causes them to exert a decreased 
effective electrostatic charge in their influence on neigh- 
boring acid-bases undergoing titration. This in turn 
decreases the electrostatic work that must be performed 
to create charge on the neighboring acid-bases and shifts 
the titration curve closer to the curve that would have 
been seen if each acid-base titrated only according to its 
intrinsic pKa This electrostatic shielding due to 
increased ionic strength produces a steepening of the 
titration curve for the protein both below and above its 
isoelectric point (Figure 1-11). 

Itis possible to correct roughly” for the electrostatic 
work involved in creating charge on the molecule of pro- 
tein by assuming that in a given region of the titration 
curve, for example, between pH 2 and 5.5, only one type 
of acid-base is titrating and all of the members of this set 
have the same intrinsic pKa PK, int. Then itis assumed that 
the charge on the molecule of protein, Q,, is proportional 
to the mean net proton charge number, Zu and that 
DE. zap: Which is proportional to a free energy, is shifted 
arithmetically by the electrostatic work, which is a free 
energy and which should be proportional to Zu... 

The values of the intrinsic acid dissociation con- 
stants obtained by these corrections for electrostatic 
work agree with expectation (Table 2-2) to a certain 
extent. The value of pK, int for the carboxyl groups in sev- 
eral proteins the titration curves of which between pH 2 
and 5.5 have been analyzed in this way” are between 4.0 
and 4.8. The titration of tyrosine side chains in a native 
protein can be followed independently by using the large 
difference in ultraviolet absorbance between the phenol 
and the phenolate anion to calculate E and fi.” The 
values of pK,jn, corrected for electrostatic work, for 
tyrosines in several proteins” are between 9.4 and 10.8. 
The contribution of tyrosine to the titration curve 
between pH 8 and 12 can then be deducted from the over- 
all curve, and values of pK, in for the lysines in these same 
proteins can be calculated. They lie between 9.8 and 10.4. 

The titration curves of proteins usually fail to meet 
expectations in one key aspect. There are usually too few 
acid-bases contributing to the titration.”” The deficit is 
most easily noticed in the case of histidine and tyrosine. 
The number of moles of protons dissociating from a 
mole of protein between pH 5.5 and 8 is often less than 
the moles of histidine in a mole of that protein. This 
deficit can be explained by assuming that the values of 
pK, for one or more of the histidines have been lowered 
and that their titrations have become buried in those of 
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the large number of carboxylates. The moles of tyrosine 
the ultraviolet absorption of which displays the expected 
shift between pH 9 and 11 upon formation of the pheno- 
late anion is often less than the total moles of tyrosine 
present in a mole of the protein. For example, only four 
of the six tyrosines in ribonuclease can be titrated”” 
and only two of the four tyrosine side chains of chy- 
motrypsinogen can be titrated?! within accessible 
ranges of pH. With most proteins, values of pH greater 
than 11 are inaccessible, so it can be said only that each 
of these missing tyrosines has a pK, greater than 11. 

Both the decreases in the values of pK, for the his- 
tidines and the increases in the values of pK, for the 
tyrosines implied or demonstrated by these results are 
reasonable. If these shifts in pK, are due to burying the 
side chains in the interior of the protein, even though 
they remain accessible to the solvent and capable of 
acid-base reactions, their neutral forms should become 
more stable relative to their charged forms. In most 
cases, the missing acid-bases in a titration curve are 
assumed to be buried in the interior of the folded 
polypeptide. Such buried acid-bases can be seen in the 
crystallographic molecular models of proteins. For 
example, in the crystallographic molecular model of 
ribonuclease, Tyrosine 25 is almost completely buried 
(the solvent accessibility of its phenolic oxygen is only 
0.02) and Tyrosine 97 is completely buried.*™ It has been 
assumed that these are the two tyrosines in the native 
protein that do not participate in acid-base titrations. 

These examples of buried tyrosines or histidines are 
special cases of the fact that polar amino acids are found 
in the interior of a protein, even ones that are normally 
charged. For example, Arginine 30 in the crystallographic 
molecular model of xylose isomerase from Anthrobacter 
strain B3728 is completely surrounded by both the back- 
bone and mostly carbon-hydrogens of other side chains 
(Figure 6-41).'* It does not form an ion pair with any 
anionic side chain. Instead, its five donors form hydro- 
gen bonds with four acyl oxygens from the backbone and 
a molecule of water. Because one of the hydrogen bonds 
is to a molecule of water, it cannot be determined 
whether or not the guanidino group is positively charged. 
Aspartate 76 is buried in the interior of ribonuclease T, of 
A. oryzae? and a cluster of three glutamates, two his- 
tidines, and an aspartate is buried in the interior of the 
iron free form of the R2 protein of ribonucleoside- 
diphosphate reductase.**° 

The amino acids that are charged at neutral pH are 
either the anionic conjugate bases of neutral acids, for 
example, carboxylates, or the cationic conjugate acids of 
neutral bases, for example, ammonium cations. It may be 
the case that when such an amino acid is buried, it is 
buried as the neutral acid or the neutral base, respec- 
tively. This conclusion follows from the fact that the far- 
ther the pK, of anormally charged amino acid is from 7.0, 
the less likely it is to be buried (Table 6-2). Such a buried 
amino acid usually participates in a set of hydrogen 
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bonds that fit its donors or its acceptors (Figure 6-41), so 
it is neither surprising nor informative that replacing it by 
site-directed mutation with an amino acid that is isos- 
teric but has a different pattern of donors and acceptors, 
such as an asparagine for an aspartate, usually produces 
a less stable protein.” From examining closely the con- 
stellation of donors and acceptors around such a buried 
amino acid, however (see Problem 6-8), one often comes 
to the conclusion that it is buried in its charged form. If 
this is the case, the protein itself must somehow replace 
most of the enthalpy of hydration that is lost upon remov- 
ing the charge from the water. In such situations, it is the 
large number of complementary donors and acceptors of 
hydrogen bonds that seems to accomplish this feat in the 
absence of any compensation of charge.**’ Most of the 
time, however, when a charged amino acid is found in a 
crystallographic molecular model at a location removed 
from the water, it is found as a partner in an ion pair. 
Ion pairs are actually ionized hydrogen bonds. An 
ionized hydrogen bond is a hydrogen bond between a 
positively charged donor and a negatively charged accep- 
tor. An example would be the hydrogen bonds between 
an argininium cation and an aspartate anion (Figure 
6-42) or an argininium cation and a glutamate anion 
(Figure 6-58).®339 That almost all ion pairs are ionized 
hydrogen bonds follows from the fact that the distance 
between the two heteroatoms, one from the cation and 
one from the anion, that make the closest contact in an 
ion pair is usually that of a hydrogen bond 
(0.3 nm); the fact that the angles at this point of 
contact are usually those expected of a hydrogen bond; 
and the fact that a deuteron can be found between these 
two heteroatoms in crystallographic molecular models 
from neutron diffraction of deuterated proteins.*”” 
Although occasionally an ion pair will be as ideal as 
the one illustrated in Figure 6-42,°°°’"" in which the two 
acceptors of the carboxylate are respectively occupied by 
two of the donors of the guanidinium ion to form a six- 
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membered ring, usually it is more peculiar because mol- 
ecules of protein are the products of evolution by natural 
selection. For example, a positive side chain and a nega- 
tive side chain will not be directly hydrogen-bonded to 
each other but linked through an intermediate, as 
Arginine 102 and Aspartate 142 are linked by Threonine 
107 in a-lytic endopeptidase (Figure 6—43).”’ As if to illus- 
trate the irrelevance of placing positive next to negative, 
Arginine 8 and Arginine 366 in pepsinogen, albeit each 
hydrogen-bonded to carboxylates, are nevertheless 
stacked on top of each other, their x molecular orbital 
systems parallel to each other, within a buried hydrogen- 
bonded cluster.*” 

Whether the hydrogen bond between a carboxylate 
anion on the one hand and a histidinium, ammonium, or 
guanidinium cation on the other is ionized or not 
depends formally on whether the proton is on the 
stronger base, in which case the bond is ionized, or on 
the weaker base, in which case it is neutral. In almost 
every instance in which a potentially ionized hydrogen 
bond is found in a crystallographic molecular model, it is 
unknown on which atom the proton resides. An estimate 
of the effect of the relative permittivity on the location 
of the proton in an ionized hydrogen bond has been 
made,” and it was concluded that the relative permit- 
tivity of the surroundings would have to be less than that 
of CCl, (€, = 2.2) before the shift of the proton from an 
ammonium cation to a carboxylate anion in the ionized 
hydrogen bond between them 
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would be favored. If this is the case, such ionized hydro- 
gen bonds are probably ionized even when surrounded 
by protein because the relative permittivity of protein is 
thought to be between 3 and 6.“ Certainly, in the crys- 
tallographic studies of proteins by neutron diffraction, in 
which the location of the proton has been observed 
directly, it usually forms a normal obond with the 
stronger base. 

The energy required to transfer an ionized hydrogen 
bond from water to a region of low relative permittivity is 
almost as large as the energy required to transfer a mono- 
valent ion,” so a buried ion pair is only marginally less 
unstable than a buried, uncompensated arginine, lysine, 
aspartate, or glutamate. Consequently, when one 
member of an ion pair is replaced by site-directed 
mutation, the protein becomes less stable, but only by 
10-15kJ mof) P Such observations are, however, 
ambiguous because the amino acids in the vicinity of the 
mutation can rearrange to take advantage of alternative 
compensations. For example, when Aspartate 193 repre- 
sented in Figure 6-42 is mutated to an asparagine, this 
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> 0.17 nm) 
9” Two 


ecule of water (open circle). The interaction between 


the two oppositely charged side chains, the arginine 
and the aspartate, is mediated by their respective 


hydrogen bonds to Threonine 107. This drawing was 


produced with MolScript.’” 
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of a-lytic endopeptidase from L. enzymogenes. 
interior of the protein are displayed as well as a mol- 


Figure 6-43: Hydrogen-bonded network involving 
Arginine 102 and Aspartate 142 in the crystallo 


graphic molecular model (Bragg spacing 


asparagine swings away to form hydrogen bonds with the 
backbone amido nitrogen-hydrogen of Histidine 196 and 
the backbone acyl oxygen of Glutamate 95’, and Arginine 
13 swings away to form a new ionized hydrogen bond with 
the side chain of Glutamate 95’ as well as a hydrogen bond 
with the acyl oxygen of Threonine 94.3 

A buried ionized hydrogen bond is less stable than 
a buried neutral hydrogen bond”" and certainly less 
stable than an isochoric pair of hydrophobic side chains. 
When the partially buried ionized hydrogen bonds 
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among Arginine 31, Glutamate 36, and Arginine 40 in the 
Arc repressor were replaced with hydrophobic interac- 
tions among a methionine, a tyrosine, and a leucine, 
respectively, the mutant that resulted was -16 kJ mol 
more stable than the wild-type protein.” Why buried 
ionized hydrogen bonds uninvolved in the function of the 
protein have not been eliminated in this way by evolution 
by natural selection is unknown. Buried, ionized hydro- 
gen bonds, however, are rare; most ionized hydrogen 
bonds are found on the surfaces of crystallographic 
molecular models of proteins where they can be stabi- 
lized by the solvation of the water. Even then they repre- 
sent a minority of the ionized hydrogen bonds that 
potentially could form. Most of the fortuitously juxta- 
posed, oppositely charged side chains on the surface of a 
crystallographic molecular model do not participate in 
hydrogen bonds?” “even though in most cases there is no 
steric reason why they cannot.” It is the competition of 
the donors and acceptors of the water that prevents it. 
The frequency with which ionized hydrogen bonds 
are observed in crystallographic molecular models (Table 
6-5) is no greater than the probability that they would 
occur at random. Only hydrogen bonds between two side 
chains are considered in the tabulation, and the proba- 
bility that a certain hydrogen bond will form at random 
is calculated from the frequency with which the amino 
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acids occur in the usual protein and the number of equiv- 
alent donors or equivalent acceptors present on each 
amino acid. If anything, ionized hydrogen bonds are 
observed less frequently than predicted by this calcula- 
tion of probability. This may be due to the fact that both 
charged donors and charged acceptors will tend to be 
more exposed to the water and less likely to form hydro- 
gen bonds. A reciprocal argument could be invoked to 
explain the fact that hydrogen bonds between hydroxyl 
groups are more frequent than expected (Table 6-5), 
because amino acids bearing hydroxyl groups are often 
buried (Table 6-2). Nevertheless, with few exceptions, the 
frequencies with which each of the particular hydrogen 
bonds are observed are about those expected from the 
probability that the respective donor and acceptor would 
encounter each other at random, regardless of charge. 

An example of the interchangeability of charged 
and uncharged donors and acceptors of hydrogen bonds 
occurs in phycobiliproteins, where an ionized hydrogen 
bond between an arginine and an aspartic acid in one 
species is replaced isochorically by hydrogen bonds 
between two glutamines in another?" 
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This, however, may not be very common because the 
amino acids surrounding an ionized hydrogen bond 
have been selected for their ability to solvate the charges 
and such a tailored environment should resist the neu- 
tralization of the bond "77 Nevertheless, all of these con- 
siderations suggest that an ionized hydrogen bond is no 
different from an un-ionized hydrogen bond except that 
it should be less stable when exposed to solvent and 
present greater problems of solvation when it is buried. 
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Problem 6-8: Examine the stereoscopic presentation to 
the left of the refined map of electron density in the 
middle of a molecule of protein with the final crystallo- 
graphic molecular model inserted into it:?! 


Ionic Interactions 305 
Table 6-5: Frequency of Hydrogen Bonds between Side Chains 
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“From tables in refs 8, 22, 97, and 98. "Probability that the hydrogen bond would occur at random, calculated only from frequencies of functional groups*’**! in proteins 
and their respective number of donors or acceptors, assuming no preferences for type of hydrogen bond. ‘Probability on the same scale as the others but not included for 


normalization. 


(A) The side chain of which of the 20 amino acids 
descends from the top left of the figure into its 


center? 


(B) Draw the structures of all of the hydrogen bonds 
made by the donors and acceptors on this side 
chain. In your drawing include the o lone pairs of 


the acceptors and the hydrogens of the donors as 


(C) 


in the drawings in Table 6-5. Indicate clearly the 
chemical identity of each donor and acceptor in 
your drawing by including enough of its structure 
that there is no doubt as to what functional group 
it is and by labeling it. 


Is the side chain charged or neutral? How can you 
be sure? 
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(D) In a solution of protein, are there an excess of 
donors or acceptors for hydrogen bonds? On the 
basis of this consideration, why should all donors 
of hydrogen bonds find a partner? Do all of the 
donors on the amino acid side chain in the center 
of the figure find acceptors? 


(E) Because the figure is for a region of electron den- 
sity from the center of the molecule of protein, 
what is the most unexpected feature of the 
arrangement? What seems to permit this unex- 
pected arrangement? 


Hydrogen Bonds 


Although they are less frequent than the hydrogen 
bonds between the amido nitrogen-hydrogens and the 
acyl oxygens of the backbone producing the secondary 
structure of a protein, hydrogen bonds between the 
donors and acceptors on the side chains of the amino 
acids are common features of crystallographic molecu- 
lar models. The stereochemistry of such hydrogen 
bonds is as expected.” The various acceptors to the 
nitrogen-hydrogen bonds that are donors on the side 
chains of glutamine, asparagine, arginine, histidine, and 
tryptophan are located preferentially at positions to 
which the sp? nitrogen-hydrogen bonds of the donors 
are pointed. The donors to the oxygens that are accep- 
tors on glutamate, aspartate, asparagine, and glutamine 
show some preference for the positions at 120° to the 
carbon-oxygen double bond to which the sp? lone pairs 
of electrons on the oxygens are pointed, but there is 
much more flexibility to their locations as they pivot 
around these lone pairs (Figure 5-10D). The distribu- 
tion of hydrogen-bond donors and acceptors around 
the hydroxyl oxygens of serines and threonines is even 
more flexible, but there are noticeable preferences for 
the two positions at dihedral angles x, of 80° and 280°. 
The donors and acceptors to the phenolic oxygen of 
tyrosine, however, have a strong preference to be in the 
plane of the ring at angles of 120° to the carbon-oxygen 
bond, as expected from the sp*hybridization of the 
oxygen. The three nitrogen-hydrogen bonds on lysine 
are almost always occupied by three respective accep- 
tors arranged around the ammonium ion at angles near 
the 109° expected from its sp’ hybridization’? but 
not located at any preferred dihedral angles 3, "7" 

It is fairly common (17% of the tryptophans, 9% of 
the tyrosines, 6% of the phenylalanines, and 1% of the 
histidines in crystallographic molecular models from 
data sets with minimum Bragg spacings less than 
0.17 nm) for a nitrogen-hydrogen, an oxygen-hydrogen, 
or a sulfur-hydrogen bond to be directed towards the 
zcloud of an aromatic side chain with its hydrogen 
close enough (< 0.3 nm) to conclude that a hydrogen 
bond has been formed.” Most frequently, these hydro- 


gen bonds are between the zclouds of tyrosine and 
tryptophan acting as acceptors and the ammonium 
nitrogen-hydrogens of lysines.” When the nitro- 
gen-hydrogen bonds are themselves attached to a 
m system, however, as with glutamine, asparagine, argi- 
nine, and histidine, their z cloud is often stacked on the 
zcloud of the aromatic side chain. In such instances, 
the nitrogen-hydrogen bonds point away from the aro- 
matic ring,” and none can form a hydrogen bond 
with it. There are exceptions, however, such as 
Glutamine 96 in human HLA class I histocompatibility 
antigen A-2 (Figure 6-22) and Glutamine 399 in ribu- 
lose-bisphosphate carboxylase (Figure 6-44B). 

As is the case with the backbone of the polypeptide, 
when the donors of hydrogen bonds on the side chains of 
the amino acids are removed from the water and 
stripped of their hydrogen bonds with the solvent, there 
would be a considerable increase of enthalpy if they did 
not find new partners in the interior of the protein. Most 
if not all of them do. 

One of the remarkable features of the buried 
hydrogen bonds that result from this energetic impera- 
tive is that they tend to be clustered. For example, of the 
54 side chains in myoglobin that form hydrogen bonds 
with atoms in the protein other than bound water, 16 
participate in eight closed pairs and nine participate in 
three closed triplets, but 29 participate in larger clus- 
ters.”° These clusters often incorporate buried water. 
Examples of such clusters occur in deoxyribonuclease I 
(Figure 6-44A)° and in ribulose-bisphosphate carboxy- 
lase (Figure 6-44B).°°! In these clusters, charged amino 
acids participate as donors and acceptors of hydrogen 
bonds as readily as uncharged amino acids and there is 
no obvious balancing between positive and negative 
charges (Figure 6-44A). 

Clusters of hydrogen bonds serve to orient func- 
tionally important amino acids. For example, a “complex 
network of hydrogen bonds” serves to orient the six his- 
tidines responsible for chelating the copper and the zinc 
in superoxide dismutase.*”’ Histidine 57 in the active site 
of chymotrypsin is oriented by a hydrogen bond to 
Aspartate 102, which in turn is oriented by three other 
hydrogen bonds, one to each of its three remaining 
acceptors.’ Histidine 31 in deoxyribonuclease I is func- 
tionally important and is held in position by the cluster 
in which it participates (Figure 6-44A), as is Histidine 325 
in ribulose-bisphosphate carboxylase (Figure 6-44B). 
The hydrogen bond in the case of deoxyribonuclease I 
forces the dihedral angle x, of Histidine 31 to assume an 
unfavorable value when it is positioned properly. 
Carboxylic acids, histidines, and arginines are most sus- 
ceptible to such pinning because they have donors and 
acceptors at two or more separate locations on their side 
chains, and they are rigid structures because of their 
m molecular orbital systems. These features make them 
easily immobilized. 

Just as an accounting of the concentrations of all of 
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the concentration of 


acceptors of hydrogen bonds exceeds the concentration 


’ 


for example, the cytoplasm ofa cell 


of donors. There are 300 acceptors but only 180 donors 
for every 100 amino acids in a protein.*”**! The presence 


of nucleic acid and carbohydrate only increases this dis- 


parity. When a polypeptide is unfolded, its donors and 


acceptors are freely accessible to the solution and partic- 


ipating in rapidly changing hydrogen bonds with mole- 


cules of water, but because at any given instant only one 
acceptor can pair with each donor, there will be, at all 


times, a concentration of unoccupied acceptors at least 


as large as the concentration of this inescapable excess of 
acceptors over donors. The concentration of unoccupied 
donors at a given instant, however, will be small if not 


insignificant. 


Whenever the donor of a hydrogen bond is 
removed from water during the folding of a protein but 


paired with an acceptor from the protein, the total 


number of hydrogen bonds in the solution does not 
change. Almost every one of these new hydrogen bonds 
will have the same enthalpy of formation as the old 
hydrogen bond between that donor and water because 
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almost all of the acceptors within a molecule of protein 
have acid dissociation constants associated with their 
o lone pairs of electrons that are not appreciably differ- 
ent from that of the lone pairs of electrons on water 
(pK, = -1.7). Consequently, the enthalpy of formation of 
that hydrogen bond will be close to 0 (Equation 5-49). 
Because both the competition of the water for this donor 
(Equation 5-45) and the entropies of approximation 
involved in forming the regular structures of the 
polypeptide backbone (Figure 5-19) or a hydrogen bond 
between two side chains are entropic terms, they can be 
combined into the larger question of the change in stan- 
dard entropy accompanying the folding of the polypep- 
tide. 

If upon folding, however, a donor for a hydrogen 
bond, such as the nitrogen-hydrogen of an amide or the 
oxygen-hydrogen of a hydroxyl group, finds itself 
sequestered within the structure without an acceptor, 
the number of hydrogen bonds in the solution decreases 
by one. This unsatisfactory sequestration would produce 
a change in standard enthalpy* of +15 to +20 kJ mol” 
(Table 5-2) and would consequently squander a consid- 
erable portion of the net free energy available for folding. 
Consequently, such a loss must be avoided, and it is 
likely that every nitrogen-hydrogen bond and 
oxygen-hydrogen bond of a folded polypeptide partici- 
pates as a donor in a hydrogen bond, either with water, 
with an acyl oxygen of a peptide bond, or with a lone pair 
of electrons on a side chain. It comes as no surprise that 
there are few” if any'"”*® unoccupied donors of hydro- 
gen bonds in crystallographic molecular models. 

If, upon folding, an acceptor such as a lone pair of 
electrons on the acyl oxygen of an amide or on the 
oxygen of a phenol or alcohol finds itself without a donor, 
there is not much ofa penalty. For example, if as many as 
half of the excess of acceptors over donors in the 
polypeptide were to become sequestered unoccupied, 
the increase in the free energy of formation of the hydro- 
gen bonds in the solution would be less than -RT In 0.5 
or 1.7 kJ (mol of folded polypeptide)’. It comes as no sur- 
prise that there are quite a few unoccupied acceptors of 
hydrogen bonds in crystallographic molecular models. 
The most obvious examples of unoccupied acceptors are 
the second lone pairs of electrons on the acyl oxygens in 
a Bsheet buried in the center of a molecule of protein 
(Figure 6-9). The fact that only a fraction of the acyl oxy- 
gens on either the backbone or on the side chains end up 
with two donors (Figure 6-7) is inconsequential because 
many acceptors were vacant before folding occurred 


* This change in standard enthalpy is not to be confused with the 
dissociation of a hydrogen bond in the reaction described in 
Equation 5-22. In this situation, in which the dissociated donor and 
acceptor are not sequestered from the solvent, equiergonic hydro- 
gen bonds are formed between the dissociated donor and the dis- 
sociated acceptor with surrounding molecules of water, there is no 
net decrease in the concentration of hydrogen bonds in the solu- 
tion, and the change in standard enthalpy is 0. 


anyway. All of these considerations should be kept in 
mind when the standard free energy of formation for a 
hydrogen bond is being assessed experimentally, 
because it is often the case that these differences in 
importance between donor and acceptor affect the 
results of the experiment. 

The necessity that the donor of a hydrogen bond 
retain an acceptor is particularly relevant when the 
indole of tryptophan is considered. The side chain of 
tryptophan is remarkably soluble in ethanol,'” which 
has twice as many acceptors of hydrogen bonds as 
donors. Likewise, during partition between water and 
1-octanol, the side chain of tryptophan has the greatest 
preference for 1-octanol of all the amino acids (Figure 
5-24). As the indole contains only a donor, a net of one 
hydrogen bond is created every time it is dissolved in 
ethanol. When it is transferred from water to 1-octanol, a 
net of one hydrogen bond is also created because a solu- 
tion of indole in water has more donors than acceptors 
and l-octanol has more acceptors than donors. When 
indole is transferred, empty donors disappear from water 
and empty acceptors disappear in the alcohol. 
Consequently, the side chain of tryptophan is signifi- 
cantly more hydrophilic”“ than is indicated by its solu- 
bility in ethanol or its transfer between water and 
1-octanol, two proposed measurements of its hydropho- 
bicity.'"*'” Because a solution of protein has, as does 
ethanol or 1-octanol, more acceptors than donors, simi- 
lar imbalances of donors and acceptors have a major 
effect on the distribution of amino acids between the sur- 
face and the interior of a molecule of protein or the cou- 
pling of the donors and acceptors of hydrogen bonds 
withdrawn from water during the process of folding the 
polypeptide. 

One example of the strong tendency of tryptophan 
to retain its hydrogen bond with water occurs in the 
structure of the Bence-Jones protein Rhe. A tryptophan 
in the center of the crystallographic molecular model of 
this protein, though completely buried, is engaged in a 
hydrogen bond with a buried molecule of water sitting 
next to its indole nitrogen (Figure 6-39).°° This molecule 
of water is trapped in the interior during the folding of 
the polypeptide, and its two donors are hydrogen- 
bonded in turn to two acyl oxygens from the backbone. 
In y-II crystallin, two of the tryptophans are also hydro- 
gen-bonded to buried molecules of water.'® Usually, 
however, tryptophan retains the hydrogen bond to the 
nitrogen-hydrogen bond of its indole in less dramatic 
ways. For example, all of the donors in the indoles of the 
tryptophans of chymotrypsin retain hydrogen bonds 
with the solvent or another acceptor in the interior.” The 
other two tryptophans in y-II crystallin form hydrogen 
bonds with acyl oxygens. In deoxyribonuclease I, all of 
the tryptophans, though mostly buried, retain contact 
with the solvent at their nitrogen-hydrogen bonds.® The 
indole nitrogen-hydrogen bond of Tryptophan 21 in the 
lipoyl domain of dihydrolipoyllysine-residue acetyl- 


transferase from B. stearothermophilus, which does not 
fully exchange with °H,O in the solvent over 3 years, is 
well buried in the core of the protein but hydrogen- 
bonded to the acyl oxygen of Proline 61.°® 

There are also other anecdotal instances in which 
the requirement that donors must be occupied seems to 
be expressed. Arginine is one of the best examples. When 
it is partially buried, all of the five hydrogen-bond donors 
on the guanidinium are provided with acceptors (Figure 
6-41).°* In the binding site on trypsin with which both 
arginine and lysine associate normally, there is a constel- 
lation of acceptors that can occupy in turn the five 
donors on the former and the three donors on the latter 
even though the dispositions of those donors do not 
overlap. Consequently, there are empty acceptors in 
each complex but never empty donors.” When Tyrosine 
385 in 4-hydroxybenzoate 3-monooxygenase is mutated 
to a phenylalanine, it creates an empty acceptor on 
Tyrosine 201 and nothing happens, but when Tyrosine 
201 is mutated to phenylalanine, it creates an empty 
donor on Tyrosine 385 and a molecule of water is found 
in the crystallographic molecular model sitting where the 
hydroxyl of Tyrosine 201 used to be and occupying that 
donor.* When an unoccupied hydrogen-bond donor in 
the complex between a peptide and penicillopepsin is 
replaced with a methylene group, the inhibitor binds 400 
times more tightly.”° “The pH dependence of chromate 
binding and the extremely low affinity of phosphate are 
attributable mainly to the lack of hydrogen bond accep- 
tors in the binding site” of sulfate-binding protein from 
Salmonella typhimurium?” 

The difference in the importance of a donor and 
that of an acceptor affects the magnitude of the free 
energy of formation of a hydrogen bond in a protein, but 
so does its location in the structure. One of the unex- 
pected observations resulting from an examination of 
crystallographic molecular models is the high frequency 
with which hydrogen bonds between donors and accep- 
tors, each from the protein itself, occur on the surface of 
the folded polypeptide.’ Because of the strong hydra- 
tion of ions or the high relative permittivity of liquid 
water or both of these factors, ionized hydrogen bonds 
between monovalent anions such as formates or acetates 
and monovalent cations such as alkyl ammoniums, imi- 
dazoliums, or guanidiniums have negligible standard 
free energies of formation in aqueous solution.*” 
Consequently, ionized hydrogen bonds on the surface of 
a molecule of protein should be unstable, but so should 
neutral hydrogen bonds because the competition for the 
donors and acceptors by the water surrounding them 
should prevent them from forming. 

Many of the hydrogen bonds found on the surface 
of a crystallographic molecular model are artifacts of the 
constraints applied during refinement. If potential ener- 
gies that favor rather than disfavor the formation of an ion 
pair or a hydrogen bond are incorporated advertently or 
inadvertently into the procedure for refinement, ionized 
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and un-ionized hydrogen bonds will form during the 
refinement rather than in the actual protein. Such fan- 
tastical hydrogen bonds on the surface of a crystallo- 
graphic molecular model tend to appear and disappear 
as further refinement is performed and as the data set is 
extended to narrower Bragg spacing. For example, a set 
of 12 hydrogen bonds on the surface of myoglobin 
between pairs of amino acid side chains in which both of 
the partners have been conserved by natural selection 
throughout all myoglobin sequences had been identified 
in a refined molecular model of the protein.” When the 
Bragg spacing of the data set was decreased and the 
refinement significantly improved,’” seven of these 
hydrogen bonds, four of which had been between oppo- 
sitely charged side chains, were no longer present in the 
crystallographic molecular model.* All assignments of 
hydrogen bonds between two amino acid side chains on 
the surface of a protein should be regarded with skepti- 
cism unless properly calculated omit maps clearly indi- 
cate their existence.t 

Nevertheless, hydrogen bonds, both ionized and 
un-ionized, probably do exist on the surface of a protein. 
When they do, there are probably particular reasons for 
their existence. Steric effects of neighboring amino acids 
and the backbone of the polypeptide can bring a donor 
and acceptor together in an orientation such that 
entropy of approximation sufficient to overcome solva- 
tion of ions and competition by water is realized. It is also 
possible that these hydrogen bonds are simply the 
random result of the participation of all of the donors 
and acceptors on the surface of the molecule of protein 
in the hydrogen-bonded network of the water surround- 
ing it (Figure 6-38). In this case, these hydrogen bonds 
would be only the fortuitous outcome of the fact that the 
positions of these donors and acceptors in the larger lat- 
tice happen to be adjacent to each other. This hydrogen- 
bonded network of waters and donors and acceptors 
from the protein itself should be a rather fluid structure. 
The crystallographic molecular model represents only 
the structure of lowest energy in a constantly fluctuating 
environment. One observation, however, suggesting that 
some of these hydrogen bonds on the surface of the crys- 
tallographic molecular model are real is that they have 
negative standard free energies of formation. 

The standard free energies of formation for hydro- 
gen bonds seen in the crystallographic molecular models 
of proteins have been estimated by site-directed muta- 
tion. It is not possible to make an accurate estimate of the 
standard free energy of formation of such a hydrogen 
bond by mutating only one member of the pair "A single 
mutation will always have steric, hydrophobic, and elec- 
trostatic effects associated with it that are unrelated to the 


* C. Chothia and A.M. Lesk, personal communication. 

+ This is yet another instance in which omit maps must be used to 
position atoms correctly and eliminate the artifacts inherent in the 
constraints applied during refinement by simulated annealing. 
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loss of the hydrogen bond itself but that cannot be sepa- 
rated from the change in standard free energy for only the 
loss of the hydrogen bond. To correct for these effects, a 
double-mutant cycle is performed.’ A double-mutant 
cycle is a set of three site-directed mutations: the single 
mutation of the donor in the hydrogen bond, the single 
mutation of the acceptor in the hydrogen bond, and the 
double mutation ofboth. The four standard free energies 
of folding of these three mutants and the unmutated wild- 
type protein are then measured, and they are used to 
create a linkage relationship: 


DA ———~ MA 


AG" 
DA 
AG AD A ane (6-7) 
DM MM 
DM 


where DA is the wild-type protein, MA is the single 
mutant of the donor, DM is the single mutant of the 
acceptor, and MM is the double mutant. The change in 
standard free energy AG’), is the change in free energy of 
folding when the donor is mutated in the presence of the 
acceptor; AG°py, when the donor is mutated in the 
absence of the acceptor; AG°,p, when the acceptor is 
mutated in the presence of the donor; and AC au: when 
the acceptor is mutated in the absence of the donor. By 
definition 


Each of these changes in standard free energy is the dif- 
ference* in the standard free energies of folding between 
the two proteins connected by the respective arrows 
defining the mutation. A positive value for the difference 
states that the mutant version is less stable than the 
unmutated version. Conveniently, although the actual 
standard free energy of folding of a protein cannot be 
estimated accurately, differences in standard free energy 
of folding can.*” 

The free energy of formation of the hydrogen 
bond AG° a; (Table 6-6) should be 


AG ap = AG ou Se AG on Se AG au u AG un (6-9) 


For example, in the crystallographic molecular model of 
ribonuclease from Bacillus amyloliquifaciens, there is an 


* As is usually the case in physical chemistry, the difference is the 
standard free energy of folding for the product of the mutation 
minus that for the unmutated protein, the reactant. 


Table 6-6: Standard Free Energy of Formation of 
Hydrogen Bonds in Proteins Estimated from Double- 
Mutant Cycles” 


AG am 
donor/acceptor location? (kJ mol) 
lysinium/glutamate?”® surface -2.3 
argininium/aspartate*”’ surface 0.9 
argininium/aspartate*”’ surface -2.0 
aspartate/argininium/asp artate? 377 surface -3.3 
amino terminus/glutamate?”® surface -6.3 
lysinium/glutamate?”® surface 0.0 
serine/aspartate”” buried -5.7 
lysine/threonine’® buried -6.3° 
arginine/glutamate”” buried -7.1° 
arginine/aspartate”” buried -27°f 
histidine/aspartate?® buried -20° 


“Where available, values were for measurements derived from the standard free 
energies of folding or standard free energies of association at an ionic strength 
close to that encountered in cytoplasm (0.15-0.2 M). "Based on a crystallographic 
molecular model of the protein. “Standard free energy of formation calculated 
from double-mutant cycles by Equation 6-9. “Triple-mutant cycle; standard free 
energy is for the mean of the two hydrogen bonds. “Estimated from free energies 
of association rather than free energies of folding. ‘Mean of two hydrogen bonds. 


ionized hydrogen bond between Arginine 110 and 
Aspartate 8. The difference in free energy of folding 
between the protein with both the arginine and the 
aspartate and a mutant in which Arginine 110 is replaced 
with alanine, AG°p,, is -3.3 kJ mol; that between the 
mutant in which Aspartate 8 is replaced with alanine and 
the double mutant, AG°py, is —4.2 kJ mol"; that between 
the protein with both the arginine and the aspartate and 
the mutant in which Aspartate 8 is replaced with alanine, 
AG° ap, is +2.3 kJ mol"; and that between the mutant in 
which Arginine 110 is replaced with alanine and the 
double mutant, AG°,y, is +1.4 kJ mol’. Therefore, the 
free energy of formation of the hydrogen bond?” is 
-0.9 kJ mol. The single mutations of each member of 
this particular pair would have led to contradictory con- 
clusions concerning the strength of the hydrogen bond; 
on the one hand that it was endergonic and on the other 
that it was exergonic. 

As expected, the hydrogen bonds between donors 
and acceptors on the surface of a protein have marginal 
stability (Table 6-6) and are in the range observed for 
similar hydrogen bonds on an isolated œ helix (Table 
5-7). The standard free energies of formation for buried 
hydrogen bonds, however, are significantly more favor- 
able, again as expected. Hydrogen bonds found in buried 
locations of a molecule of protein are removed from 
competition with water, found in a region of lower rela- 
tive permittivity, and fixed more rigidly in their orienta- 
tions than hydrogen bonds on the surface. All three of 
these circumstances should significantly increase their 
stability relative to those on the surface, so it is surprising 
that the differences observed are as small as they are. 

Even the value for the standard free energy of for- 


mation of a particular hydrogen bond in a protein 
obtained by a double-mutant cycle is not completely free 
of contributions from interactions with neighboring 
amino acids. When all of the amino acids in a cluster of 
hydrogen bonds surrounding the hydrogen bond 
between Arginine 218 (TEM) and Aspartate 49 (BLIP) in 
the complex between ß-lactamase TEM-1 from E coli 
and its inhibitor protein BLIP were mutated to alanine, 
the standard free energy of formation of that hydrogen 
bond increased from -9 to +1kJ mof) Hl Similar 
increases of 4-6 kJ mol were observed when the amino 
acids surrounding three other hydrogen bonds in the 
same cluster were mutated to alanine. Consequently, the 
standard free energies of formation listed in Table 6-6 
may be only lower limits of the value for that hydrogen 
bond in the absence of assistance from its surroundings. 

Approximation is probably the greatest contribu- 
tor to the stability of a buried hydrogen bond between 
two side chains. Following formation of the secondary 
structure and the alignment of secondary structures by 
packing, the donor and acceptor of a buried hydrogen 
bond should be efficiently aligned and a considerable 
amount of entropy of approximation should have been 
realized, yet the free energies of formation of such buried 
hydrogen bonds are less than -30 kJ mol” in the most 
advantageous circumstances (Table 6-6). The wide vari- 
ability in the standard free energies of formation could 
reflect wide differences in the success with which donor 
and acceptor are aligned given all of the steric problems 
of the interior of a protein. 

There are other experimental observations suggest- 
ing that approximation is not so successful as it should 
be. In a series of tight complexes (dissociation constants 
less than 750 nM) between thermolysin and a set of lig- 
ands that bind to its active site, when the respective 
nitrogen-hydrogens of the phosphonamidates in the lig- 
ands, which each form a hydrogen bond with the acyl 
oxygen of Alanine 113 in the crystallographic molecular 
model of the complex,” were replaced with methylenes, 
the association constants for the ligands remained the 
same.**? When corrected for the removal of the two 
hydrogen-carbon bonds of the methylene from water, 
the standard free energy of formation for this buried, 
rigidly aligned hydrogen bond is -6 kJ mol”, well within 
the range of those for buried hydrogen bonds in Table 
6-6 but not anywhere near the value predicted from the 
entropy of approximation that must be realized. Even 
more surprising is that when the amido nitrogen-hydro- 
gen of a hydrogen bond in the middle of an œ helix of 
T4 lysozyme was replaced with an ester oxygen, the free 
energy of folding of the protein increased*™ by 7 kJ mol”, 
but that increase was indistinguishable from the increase 
expected for the enthalpy of formation of the hydrogen 
bond between the acyl oxygen of the ester, relative to the 
acyl oxygen of the unmutated amide, and the nitro- 
gen-hydrogen with which it forms a hydrogen bond.*® In 
other words, these latter experiments suggest that the 
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free energy of formation of a hydrogen bond in the 
middle ofan o helix in a protein is indistinguishable from 
0, even though considerable entropy of approximation is 
realized. All of these results emphasize that it is difficult 
to form a hydrogen bond in an aqueous solution. 

Even though there are hydrogen bonds in a mole- 
cule of protein that do have negative standard free ener- 
gies of formation, it is the structure of the protein that 
approximates the donor and the acceptor, causing their 
hydrogen bond to become stable. It is this approxima- 
tion that overcomes, in many cases meagerly, the other- 
wise overwhelming competition of the water for the 
donors and acceptors. The folding of the protein that 
approximates the donor and acceptor in such a hydrogen 
bond is driven entirely by the hydrophobic effect. It is 
only after the hydrophobic effect has collapsed the 
random coil, withdrawn the donors and acceptors from 
water, and excluded water from the interior that the 
hydrogen bonds of the o helices and £ structure are able 
to form. It is only when the hydrophobic effect, expressed 
as the minimization of the internal volume of the pro- 
tein, has locked the secondary structures into the tertiary 
structure, that donors and acceptors of hydrogen bonds 
between side chains are brought close enough together 
and are constrained sufficiently that they can form oth- 
erwise unfavorable hydrogen bonds. It is only after all of 
this prelude that the observed hydrogen bond has a 
lower standard free energy of formation than the 
hydrated, separated donor and acceptor had in the 
unfolded polypeptide. 

It is the case that such favorable free energy of for- 
mation adds to the stability of the protein, but this is an 
illusory contribution. The amino acid sequence of the 
protein and hence the location and identity of each side 
chain in its structure is the result of evolution by natural 
selection. The hydrogen-bonded pair of side chains that 
currently occupies a particular location in the structure 
could have been chosen because it was the constellation 
of atoms that sterically filled that particular location in 
the structure most effectively relative to all of the other 
possibilities that were tried, not because it is a hydrogen 
bond. It has a favorable free energy of formation because 
the two side chains that were chosen for these other rea- 
sons, happened to end up with a donor and an acceptor 
adjacent to each other. The hydrogen-bonded pair is not 
necessarily the most energetically favorable pair of side 
chains that could have occupied that position. In fact, 
even though it was not so astute a process as evolution by 
natural selection that determined the choice of the 
replacements, it is sometimes the case that the double 
mutant in a double-mutant cycle or even one of the 
single mutants is as stable as the wild type containing the 
hydrogen bond. 

The relationship between the strength of a hydro- 
gen bond and the difference in pK, between donor and 
acceptor (Figure 5-14) has been verified in the context of 
a molecule of protein. As the pK, of Tyrosine 27 in micro- 
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coccal nuclease from Staphylococcus aureus, which forms 
a hydrogen bond with Glutamate 10, was lowered by sub- 
stituting various fluorinated tyrosines, the free energy of 
folding of the protein, presumably reflecting the 
decreases in the free energy of formation of the hydrogen 
bond, decreased°® by 2.0 kJ mol (unit of DEA. This 
value for the Bronsted coefficient is near that observed 
(Equation 5-37) for a hydrogen bond in CC], [1.3 kJ mol 
(unit of pK,)"]. This increase in strength, as the increase 
in the acidity of the phenolic side chain matches its pK, 
more closely with that of the glutamate, suggests that, as 
in other situations, the hydrogen bond between an acid 
and its conjugate base should be a strong one. 

In crystallographic molecular models there are 
examples of hydrogen bonds between an acid and its 
conjugate base. In turkey troponin C, Glutamate 57 
forms a geometrically ideal hydrogen bond with 
Glutamate 88 in which it is unknown on which carboxy- 
late the proton resides.” There is no experimental indi- 
cation, however, that such hydrogen bonds are 
unusually stable. The histidinium ion in the hydrogen 
bond between Histidine 24 and Histidine 119 in sperm 
whale myoglobin has a pK, of 6.0.°% If this were a partic- 
ularly stable interaction, the pK, of the acid dissociation 
that eliminates it should have been much higher (Table 
2-2). The hydrogen bond* between Lysine 206 and 
Lysine 296 of human transferrin, although necessarily 
lowering the values of pK, for the lysines participating in 
it,®° has been shown to destabilize the protein. 

No evidence has been presented that hydrogen 
bonds between acids and their conjugate bases in pro- 
teins display properties associated with low-barrier 
hydrogen bonds, but hydrogen bonds displaying one 
such property, a low fractionation factor (Equation 
5-31), have been identified in proteins. The fractionation 
factor 6 for a proton in a hydrogen bond in a protein is 
measured by following the fraction, fang, of the hydrogen 
bond of interest that remains undeuterated, AHOB, as a 
function of the mole fraction xy,9 of undeuterated water 
in a series of mixtures of H,O and D,O 


[H,0] [L,O©HOL] 
x = = 
0 [H,O] + [D,0] “ [L,OOHOL] + [L,O@DOL] 
(6-10) 


where L again stands for either H or D. 

A physical property that monitors the concentra- 
tion of the undeuterated hydrogen bond, such as the 
intensity (i,y,) of the absorption of its proton in a nuclear 
magnetic resonance spectrum is ` monttored "77 
Equations 5-31 and 6-10 can be combined to give?” 


np _ [AHOB] _ ns Zo 
= JAHB = 
ipang [ALOB] IA X40) + un 


(6-11) 


where [AL@B] is the total concentration of hydrogen 
bonds, both deuterated and protonated; and io ay, is the 
intensity of the absorption in HO The normalized 
intensity of the absorption of the proton in the hydrogen 
bond as a function of x}, is fit by nonlinear least squares 
to Equation 6-11 to obtain @. 

In this way, the fractionation factors for the protons 
within the hydrogen bonds of the secondary structure of 
a protein can be measured. There are results suggesting 
that a significant portion of these protons have fraction- 
ation factors less than 1. For example, 13 of the 87 amino 
acids in the phosphocarrier protein HPr from Bacillus 
subtilis and 36 of the 231 amino acids in micrococcal 
nuclease from S. aureus’ have been reported to have 
amido protons with fractionation factors less than 0.80, 
and six of the 76 amino acids in ubiquitin have fraction- 
ation factors less than 0.90.°” There is some uncertainty 
to these measurements because it is quite difficult to 
equilibrate all of the protons in the hydrogen bonds of 
the secondary structure of a protein with deuterons in 
the solution,” and an unequilibrated position would 
appear artifactually to have a low fractionation factor 
(Equation 5-31). In more recent studies of the fractiona- 
tion factors of protons in streptococcal protein CT" and 
the SH3 domain of proto-oncogene protein-tyrosine 
kinase from Gallus gallus,’” none of the protons in the 
nitrogen-hydrogens of the backbone had fractionation 
factors less than 0.9. Nevertheless, it is thought to be the 
case that some of the protons in the hydrogen bonds of 
the secondary structure of many proteins have abnor- 
mally low fractionation factors.*”° 

There also seems to be a correlation between the 
fractionation factor of a proton and the length of the 
hydrogen bond that it occupies in a crystallographic 
molecular model of a protein.*” In crystallographic 
molecular models built from data sets to Bragg spacing of 
less than 0.1 nm, the maps of electron density are accu- 
rate enough that the bond lengths of the hydrogen bonds 
are of sufficient reliability to identify those that are 
abnormally short,” and there are usually a few 
abnormally short hydrogen bonds (0.26-0.28 nm) among 
those between amido nitrogen-hydrogens and acyl oxy- 
gens of the backbone.*” It is thought that such shortened 
hydrogen bonds are the ones that display low fractiona- 
tion factors and therefore are low-barrier hydrogen 
bond 

It is not possible, however, for these short low-bar- 
rier hydrogen bonds to be strong hydrogen bonds?’ 
because the difference in pK, between the nitrogen- 
hydrogen (pK, = 16) and the oxygen (pK,= -0.5) is so large 
and any decrease in polarity would only widen the dif- 
ference. Whenever a polymer as long and heterogeneous 
as a molecule of protein is folded into a unique confor- 
mation, it is hard to believe, in spite of evolution by nat- 
ural selection, that all of the steric problems can be 
solved. There must be some places in the structure that 
are tight fits. When such a tight fit occurs at a hydrogen 


bond between an amido nitrogen-hydrogen and acyl 
oxygen of the backbone, the hydrogen bond shortens to 
relieve the strain, much as the hydrogen bond in hydro- 
gen maleate monoanion shortens in response to the 
steric compression. This shortened hydrogen bond must 
be weaker than the unshortened bond because there is 
repulsion energy in the compressed case that would be 
relieved on relaxation to the normal distance. This 
shorter but weaker hydrogen bond has a smaller frac- 
tionation factor because this property is determined only 
by the degree of overlap of the wells of potential energy 
confining the proton on the donor and the acceptor. It is 
a low-barrier hydrogen bond not because the strength of 
the bond has brought donor and acceptor together but 
because the contraction of the distance is imposed by the 
rest of the framework. It has also been concluded from 
studies of complexes between proteins and small ligands 
that there is no correlation between the length of a 
hydrogen bond and its strength." 


Suggested Reading 


Horovitz, A., Serrano, L., Avron, B., Bycroft, M., & Fersht, A. (1990) 
Strength and co-operativity of contributions of surface salt 
bridges to protein stability, J. Mol. Biol. 216, 1031-1044. 


Problem 6-9: Aspartate 12 and Arginine 16 are located in 
an o helix on the surface of a mutant of the ribonuclease 
from B. amyloliquifaciens. The arginine and the aspar- 
tate do not form an ionized hydrogen bond in the crys- 
tallographic molecular model of the enzyme from the 
closely related species, Bacillus intermedius, even 
though they are close enough to each other to do so. 
Three mutants were produced in the ribonuclease from 
B. amyloliquifaciens: Arginine 16 — threonine, Aspartate 
12 — alanine, and the corresponding double mutant. The 
differences in standard free energies of folding for the 
three mutants and the original protein were as follows: 


difference in standard free 
energy of folding (kJ mof" 


ionic strength =0.1 M ionic strength = 0.55 M 


AC a 2.1 1.8 
AG pa 1.7 2.0 
AG on 1.8 1.0 
Ar 1.4 1.2 


where AG°xp is for mutation of Arginine 16 in the pres- 
ence of aspartate at position 12; AG°z,, for mutation of 
Arginine 16 in the presence of alanine; AG°pr, for muta- 
tion of Aspartate 12 in the presence of arginine at posi- 
tion 16; and AG°pr, for mutation of Aspartate 12 in the 
presence of threonine.*” 


(A) Estimate the interaction between Arginine 16 and 
Aspartate 12 at the two ionic strengths. 
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(B) The uncertainty in your calculated values was 
estimated by the authors to be +0.2 kJ mol. Is the 
electrostatic interaction significantly different 
from zero at physiological ionic strength? Is this 
surprising? 


(C) What conclusion would you have reached had 
only Arginine 16 been mutated? 


Problem 6-10: 


(A) Write out the amino acid sequence of the protein 
in the drawing below of a crystallographic molec- 
ular model. This drawing was produced with 
MolScript.°” 


(B) List the pairs of cysteines that participate in the 
cystines. 


(C) What structural feature of a cystine is illustrated 
by the model? 


(D) Identify the participants in a small hydrophobic 
cluster. 


(E) List as a pair the donor and the acceptor of each of 
the hydrogen bonds in the model by the letter and 
number of its respective amino acid and by the 
respective designation defined in Figure 4-14 for 
the atom participating in the hydrogen bond. 


(F) Which hydrogen bonds are probably artifacts of 
the procedure used to refine the molecular model? 
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Problem 6-11 


Problem 6-11: The two stereo drawings to the right 
represent portions of the crystallographic molecular 
models of two proteins.””*"° These drawings were pro- 
duced with MolScript.°” 

Each of the hydrogen bonds in each stereo drawing 
is numbered in the figure. Draw the chemical structure of 
each hydrogen bond in the two stereo drawings. Number 
each of your drawings with the number for the hydrogen 
bond assigned in the figure. There are two hydrogen 
bonds numbered 5 for the same acceptor. Draw all of the 
lone pairs and all of the hydrogens on each functional 
group providing the donor and on each functional group 
providing the acceptor. For example, the correct chemi- 
cal structure of hydrogen bond number 1 is 


Hy © 
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Association of Proteins with Nucleic Acid 


It is during the association of proteins with nucleic acids 
that the importance of packing, the existence of fixed 
positions for molecules of water, the irrelevance of direct 
compensation of charge, and the stereochemical role of 
hydrogen bonding are all manifest. A double helix of DNA 
(Figure 3-9)* presents to the protein designed to associ- 
ate with it a regular structure that can be recognized by 
its peculiar shape, its pattern of hydration, its high den- 
sity of negative charge, and its array of donors and accep- 
tors of hydrogen bonds. 

Along each of the two phosphodiester backbones, 
there are regularly spaced pairs of phosphoryl oxygens on 
each phosphorus in each phosphodiester linkage. Within 
each pair, the two oxygens share between themselves the 
single negative elementary charge of the phosphoryl group 
and are directed outward at the sp’ angle of 109.5°, and 
each oxygen has the equivalent of two acceptors for hydro- 
gen bonds. 

The pairs of bases contain their own internal 
hydrogen bonds, two for the pair between adenine and 
thymine and three for the pair between guanine and 
cytosine, and these are located in the core of the struc- 
ture. Although there are a small number of proteins that 
can induce a single base to swing out of the stack, thus 
exposing both itself and its interior donors and accep- 
tors," this conformational change is a difficult one. 
Most proteins bind to DNA in its normal conformation 
and never see these hydrogen bonds in the core. The 
pairs of bases are stacked one upon the other, but they 


* You should trace the two polynucleotide backbones (O—C- 
furanose-O-P-O-C-furanose-O-P-...) through the double helix. 


do not overlap entirely. Consequently they are a helical 
staircase but one with narrow treads. 

There are two helical grooves, the major groove 
and the minor groove (facing the viewer in the upper half 
and lower half, respectively of Figure 3-9), the former 
wider than the latter. It is in these grooves that the 
narrow treads of the stairs are found. Each pair of bases 
projects a characteristic pattern of donors and acceptors 
into each groove. The pair of adenine and thymine proj- 
ects a methyl group, an acceptor, a donor, and an accep- 
tor into the major groove and two acceptors into the 
minor groove; and the pair of guanine and cytosine proj- 
ects an acceptor, an acceptor, and a donor into the major 
groove and an acceptor, a donor, and an acceptor into 
the minor groove. The order and orientation of these 
donors and acceptors differs between a guanine- 
cytosine pair (6-13) and a cytosine-guanine (6-14) pair 
and between an adenine-thymine pair (6-15) and a 


thymine-adenine pair (6-16):"* 


N d \ d 
N Son N N HOS? CH 
UN THON \ Z Noun À NC 
N= N N 


6-13 6-14 
Zei d \ / \ 
N N HOG CH3 Hat Onn Qe 
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" Wi N N ‘=v 

\ / oe © 


6-15 6-16 


so that a sequence containing a G can be distinguished 
from one containing a C; and a sequence containing an 
A, from one containing a T. 

All of these donors and acceptors of hydrogen 
bonds provide fixed positions for occupation by mole- 
cules of water, but not all of them are firmly occupied. 
Each segment of DNA has its own characteristic pattern 
of fixed positions for molecules of water (open circles in 
Figure 3-9), and it has been observed that when a mole- 
cule of protein binds to DNA, some of these positions 
remain occupied within the complex.’ Consequently, 
the donors and acceptors on these molecules of water 
that are incorporated into the complex, as well as those 
provided by the bases in the major groove and the minor 
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groove and those along the backbone, all provide keys for 
the recognition of the double-helical DNA by the protein. 

There are two levels of recognition on which pro- 
teins operate in binding to DNA. Certain proteins are 
required by their function to recognize any segment of 
double-helical DNA regardless of its sequence. Examples 
of such proteins are histones that form chromatin from 
DNA, the RecA protein that catalyzes recombination, 
helicase and DNA-directed DNA polymerase that are 
components of the system replicating DNA, and DNA 
topoisomerase that passes one segment of DNA through 
another. These proteins recognize only the overall shape 
of a molecule of DNA and the acceptors along its phos- 
phodiester backbone. Other proteins are required by 
their function to recognize specific sequences of double- 
stranded DNA and bind tightly to them. Examples of 
such proteins are repressors that shut off certain genes, 
transcription factors that initiate transcription at certain 
genes, and activators that increase the rates of transcrip- 
tion of certain genes. Many of these latter proteins are 
able to bind to any segment of a double helix of DNA and 
then run along the double helix until they reach their tar- 
gets, and proteins of this type must perform both levels 
of recognition. Such proteins demonstrate that the abil- 
ity to recognize specific sequences is a special case of the 
ability to recognize DNA in general. 

One property of the proteins that recognize DNA is 
that their composition is biased against negatively 
charged amino acids and in favor of positively charged 
amino acids.“ On the open surfaces of these proteins 
that do not participate in the complexes with DNA, the 
density of glutamates and aspartates is the same as that 
on the open surface of any other protein, but in the inter- 
faces between these proteins and DNA, the density of 
glutamates and aspartates is about 40% of that found in 
the interfaces between two molecules of protein. On the 
open surfaces, the density of arginine and lysine in these 
proteins is 30% greater than that on the open surfaces of 
other proteins, but at the interfaces between these pro- 
teins and DNA, the density of lysines and arginines is 2.5 
times that in the interfaces between two molecules of 
protein. These equivalent biases against electrostatic 
repulsion and in favor of electrostatic attraction are rea- 
sonable responses to the high density of negative charge 
on the DNA to which these proteins must bind. 

These overall biases, however, are distributed evenly 
over the interface between protein and DNA and not 
focused only on the phosphodiesters within it. Although 
lysines and arginines do provide donors to the acceptors 
on the phosphodiester backbone,*”’ they do so no more 
frequently than other amino acids. For example, in the 
crystallographic molecular model of the complex of DNA 
with rat DNA polymerase ß, only one of the donors to the 
phosphodiesters is the guanidinium of an arginine; the 
other nine are nitrogen-hydrogens from the polypeptide 
backbone and hydroxyls of a tyrosine and a threonine.“ 
In the crystallographic molecular model of the complex 
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between DNA and the regulatory protein Cro from bacte- 
riophage 434, many of the donors to the phosphoryl oxy- 
gens are amido nitrogen-hydrogens from the polypeptide 
backbone,*” while in that between DNA and the regula- 
tory protein Cro from bacteriophage A, a tyrosine, a thre- 
onine, an asparagine, a glutamine, and two amido 
nitrogen-hydrogens from the backbone provide donors to 
phosphoryl oxygens (Figure 6-45)."° In the complex 
between topoisomeraseI and DNA, hydrogen-bond 
donors to the phosphoryl oxygens are provided by an 
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asparagine and a histidine among others TT And in three 
successive phosphodiesters in the complex between 
deoxyribonuclease I and DNA, an arginine, two histidines, 
an aspartic acid, an asparagine, a tyrosine, and a threonine 
provide the donors.” 

Donors to phosphoryl oxygens are also provided by 
molecules of water that are incorporated into the com- 
plex and form bridges to donors and acceptors on the 
protein.*’’ In the complex between the repressor from 
bacteriophage A and DNA, five amido nitrogen-hydro- 
gens, two lysines, two tyrosines, five asparagines, two 
glutamines, and 11 waters together provide all of the 
donors for the 10 phosphodiesters in contact with the 
protein.“ 

Some proteins that recognize only DNA and not 
specific sequences within it use the regularity of its 
double-helical structure as a key. For example, each of 
the octameric complexes of histones around which the 
DNA winds in chromatin has a surface with a repeating 
pattern that matches the helical repeat of DNA 

Proteins that are required to associate with specific 
sequences in DNA, rather than simply DNA in general, 
recognize the patterns of donors and acceptors for 
hydrogen bonds in its grooves in addition to providing 
donors and acceptors of hydrogen bonds for the phos- 
phodiester backbone and its associated water. The major 
groove in the DNA is the main key used by a protein in 
recognizing a specific sequence of base pairs. It is in the 
major groove that the patterns of donors and acceptors 
projected outward by the pairs of bases (6-13, 6-14, 
6-15, and 6-16) are the most legible. The protein usually 
inserts one or two of its segments of polypeptide into the 
major groove. Often it is an a helix. For example, the 
c-Jun subunit of transcription factor AP-1 (Figure 
6-46)""” and the ETS-domain protein Elk-1*" insert 
a helices into the major groove. Such an o helix can par- 
ticipate in hydrogen bonds with donors and acceptors 
from as many as five?!” or six base pairs“? in the major 
groove as well as donors and acceptors on riboses and 
phosphoryl oxygen from additional base pairs. In the 
case of the c-Jun subunit of transcription factor AP-1, the 
æ helix inserted into the major groove is the splayed end 
of one strand of an a-helical coiled coil (Figure 6-29). In 
each of these complexes the a helix runs along the groove. 

Other proteins, such as the metrepressor from 
E. coll the replication terminator of E. coli," and the 
arc repressor from E. coli,” insert two strands of B struc- 
ture running parallel to the major groove. The œ helix or 
strands of D structure inserted into the major groove pro- 
vide donors and acceptors to the acceptors and donors 
projected into the groove by the pairs of bases. 

The hydrogen bonds formed during the reading of 
these patterns (Figure 6-47)" are as varied as one might 
expect. A common pair is the double hydrogen bond 
between a guanine and an arginine (Figures 6-46 and 
6-47), 1728 either at the two n nitrogens of the argi- 
nine (Figure 6-46) 


6-17 


or at one of its nnitrogens and its enitrogen (Figure 
6-47),'!823 Arginine also can span two bases, offering a 
donor to an acceptor on each. Lysine, with its three 
donors, is often found occupying a single acceptor on a 
base or spanning two bases.” In fact, it is in providing 
donors to acceptors for the neutral bases, rather than for 
the negatively charged phosphoryl oxygens, that arginine 
and lysine seem to be most frequently employed. 
Another common hydrogen bond is that between a 


glutamine or an asparagine and adenine (Figure 
645) 114422431 


6-18 


but often glutamine or asparagine (Figure 6—46)*17421429 


or even aspartate (Figure 6-47) bridges a donor on one 
base and an acceptor on its neighbor. Other donors and 
acceptors are commonly provided by histidine, serine, 
and threonine. Aspartate*” can provide an acceptor; tyro- 
sine,*”° a donor and an acceptor; tryptophan,’ a donor; 
and cysteine,” a donor and an acceptor. The nitro- 
gen-hydrogen of a cytosine located in the major groove 
(6-14) makes a hydrogen bond with the z system ofa tryp- 
tophan in the complex between transcription factor Rob 
and its cognate DNA." 

Many of the amino acids providing donors and 
acceptors to the bases in the major groove are them- 
selves pinned by hydrogen bonds at one or more of their 
other donors and acceptors to other amino acids in the 
protein (Figure 6-47), and some of these other donors 
and acceptors can make hydrogen bonds to phosphoryl 
oxygens from the backbone of the nucleic acid to but- 
tress the hydrogen bonds in the center of the 
groove. 114435 


Although there are a few examples in which almost 
every donor and acceptor in the major groove forms a 
direct hydrogen bond to an acceptor or donor on the pro- 
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tein (Figure 6-47),"” usually a significant fraction of the 
donors and acceptors from the amino acids on the pro- 
tein form hydrogen bonds with waters that were present 
at fixed positions in the major groove before the protein 
bound to it and were subsequently incorporated into the 
complex.*” These waters then bridge donors and accep- 
tors on the protein and donors and acceptors on the DNA. 
They are as much a key for recognition of the DNA by the 
protein as the bases themselves. One dramatic indication 
of this fact is that in crystallographic molecular models of 


Figure 6-46: o Helix of a protein running along the 
major groove of a segment of double-helical DNA 


and reading the donors and acceptors of its bases.“ 
spacing = 0.3nm) was only the bZIP region 


(Glutamate 256 to Methionine 313) of the human 
containing the sequence d(AGTCATA) and its com- 
plement is displayed. Note the hydrogen bonds 
between Lysine 268 and Arginines 259 and 263 and 


phosphoryl oxygens. Four base pairs are recognized 
and two base pairs below those four. This drawing 


as well as phosphodiesters on one base pair above 
was produced with MolScript.’” 


The protein in the crystals used to gather the data 
set for the crystallographic molecular model (Bragg 
transcription factor AP-1 expressed in E coli, and 
the portion displayed in the figure is only the a helix 
from Arginine 259 to Lysine 279 that fills the major 
groove of the DNA. Only the portion of the DNA 
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closely related but distinct complexes between related 
molecules of DNA and related molecules of protein, many 
of the waters occupy the same locations.**° 

Waters bridging donors and acceptors on the DNA 
and donors and acceptors on the protein are common fea- 
tures of the extensive network of hydrogen bonds found 
at the interface between protein and DNA in the major 


groove (Figure 6—48).""°”>* On average, within a crystal- 
lographic molecular model between a protein and a mol- 
ecule of double-helical DNA there are 10 molecules of 
water that form hydrogen bonds with “both the protein 
and the DNA simultaneously and thus mediate recogni- 
tion directly. "77 On average, six of these 10 sit between two 
acceptors, none between two donors, and four between 
an acceptor and a donor too distant to form a direct hydro- 
gen bond. These 10 include molecules of water between 
donors and acceptors on the protein and acceptors on 
phosphoryl oxygens as well as donors and acceptors on 
the bases themselves. Many of the waters in such com- 
plexes reside between two functional groups with the 
same charge number and serve to screen the electrostatic 
repulsion (see for example Aspartate 193 in Figure 6-47). 

In Figure 6-47, every donor and acceptor directed 
into the major groove by the three base pairs is occupied 
by an acceptor or donor from the protein with the excep- 
tion of one, which is occupied by a molecule of water. In 
Figure 6-48, only five of the donors and acceptors 
directed into the major groove by the three base pairs are 
occupied, all five by molecules of water bridging DNA 
and protein. These are the two extremes of a continu- 
ously occupied spectrum of hydrogen bonding in the 
major groove. 

The methyl groups of the thymines also project 
into the major groove (Figure 3-9) and are used as keys 
for recognition. In the crystallographic molecular model 
of a complex between DNA and a protein that recognizes 
a particular sequence, the methyl groups on thymine are 
provided hydrophobic contacts by the protein. For 
example, the propyl group of a valine,“ the phenyl 
group of a phenylalanine,’ the butyl group of an 
isoleucine,**! the methylenes of an arginine,“ the 
methyl group of an alanine,*"* or the methyl group of an 
alanine and the methylene of a serine (Figure 6-46) can 
cradle the methyl group of a thymine in the major 
groove. 

Although it is narrower and more difficult to enter 
(Figure 3-9), the minor groove is also exploited by pro- 
teins recognizing specific sequences in the DNA. Usually, 
because it is so narrow, only a single loop of polypeptide 
is inserted into it,” and a protein that inserts a seg- 
ment of its polypeptide into the minor groove will also 
insert a sizeable segment into the adjacent major 
groove.” The less formal arrangement in the minor 
groove permits even the amido nitrogen-hydrogens of 
the backbone to occupy acceptors projecting from 
bases.“ Otherwise, the donors and acceptors from the 
protein that occupy acceptors and donors in the minor 
groove are the same as those in the major groove. 
Lysines* and particularly arginines" "44 are 
common because their side chains are long, thin, and 
flexible, but even a short negatively charged aspartate can 
provide acceptors for donors in the minor groove 721 

It was originally believed, before crystallographic 
molecular models of these complexes became available, 


that the problem of recognizing a specific sequence 
would simply require reading enough of the pattern of 
donors and acceptors in the major groove and minor 
groove and methyl groups from thymine in the major 
groove to make an unequivocal identification of the 
sequence. Although there are a few instances in which 
side chains on the protein are able to form hydrogen 
bonds to every donor and acceptor in the major groove 
(Figure 6-47), and usually many of these features are 
recognized by the protein either directly or through 
intervening molecules of water, it is often the case that 
fewer are recognized than would be necessary to make a 
positive identification.'”° Consequently other strategies 
must be used to make a positive identification. 

The most obvious of these is the use of packing to 
recognize shape, just as in the center of a molecule of 
protein the dense packing of the side chains of the amino 
acids is used to position the secondary structures. For the 
crystallographic molecular models of complexes 
between a protein and its complementary DNA, calcula- 
tions of atomic volumes “performed in the presence and 
absence of water molecules, showed that protein atoms 
buried at the interface with DNA are on average as closely 
packed as in the protein interior. Water molecules con- 
tribute to the close packing, thereby mediating shape 
complementarity.”“° This close packing means that the 
shape of the surface of the protein fits tightly into the 
shape of the surface of the DNA and its water, particu- 
larly in the major groove.‘ As the shape of the surface 
of the DNA and water in the major groove represents its 
sequence, it is the shape of the surface of the protein as 
much as anything else that reads the sequence of the 
DNA. 

Much of this complementarity in shapes is the net- 
works of hydrogen bonds (Figure 6-47), but the 
hydrophobic hydrogen-carbon bonds of the protein also 
contribute to its complementary shape. In fact, there are 
hydrophobic side chains such as phenylalanines, 
leucines, and valines**°”’ that are found in the interface 
between the protein and the major groove, the functions 
of which are not just to cradle the methyl groups of the 
thymines but to form a mold for the DNA. When one 
such side chain, Leucine 22 in the interface between 
DNA and transcription factor AREA, was mutated to a 
valine, the specificity of the transcription factor changed 
dramatically as it recognized a different set of sequences 
in the DNA.” 

The structure of the protein looking for a particular 
sequence of DNA also recognizes variation in the over- 
all shape of a segment of DNA produced by the particu- 
lar sequence of bases it contains’ For example, 
certain sequences of bases cause the minor groove to 
become narrower, and this feature of the DNA is recog- 
nized by the protein, often by the insertion of an arginine 
into the minor groove to gauge its width.“ 
Overwinding or underwinding of the DNA by particular 
sequences is recognized, - as well as intrinsic curva- 
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ture of the double helix.’ This strategy of recognizing 
preexisting, sequence-dependent peculiarities in the 
structure of the DNA, however, is difficult to separate 
experimentally from a strategy of recognizing sequence- 
dependent differences in the resistance of the DNA to 
distortion by the protein because only rarely*” is the pre- 
existing structure of the segment of DNA found in the 
crystallographic molecular model of the complex known. 

When proteins bind specifically to a segment of 


self-complementary 


portion of the interface between protein and 
the 


double-helical DNA displayed in the figure is from 
ing = 0.19nm) of the complex between the 


trp repressor from E. coli and a segment of double- 
stranded DNA with 
tion displayed contains the second d(AGT) and its 


complement and Lysine 71 to Threonine 82 from 
the trp repressor. The unattached oxygens (open 


Figure 6-48: Incorporation of molecules of water 
into the interface between protein and DNA.” The 
the crystallographic molecular model (Bragg spac- 
sequence d(TGTACTAGTTAACTAGTAC). The por- 
circles) are locations for molecules of water within 
the major groove of the DNA that have been incor- 
porated into the complex. Only hydrogen bonds 
between the bases and hydrogen bonds to the mol- 
ecules of water are drawn. This drawing was pro- 
duced with MolScript.>” 
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DNA with a particular sequence, they often distort its 
structure. One of the most obvious examples of this fact 
is the complex between the purine repressor from E. coli 
and its cognate double-helical DNA.” This protein 
thrusts two of its leucine side chains that are positioned 
within the minor groove of the DNA into the space 
between the guanine-cytosine pair and the cytosine- 
guanine pair atthe center ofthe sequence recognized by 
the protein. This wedge cants these two pairs of bases 
and creates an abrupt 45° bend in the DNA centered on 
this distortion. In the complex between TATA-box-bind- 
ing protein and its cognate DNA, two pairs of phenylala- 
nines from £ strands running across the top of the minor 
groove insert between two pairs of bases, the thymine- 
adenine and the adenine-thymine at positions 1 and 2 
and the thymine-adenine and guanine-cytosine at posi- 
tions 7 and 8 of the sequence recognized by the protein 
to cause an overall bend in the DNA of 65-80° 31%? 

Many proteins bend the DNA when they bind to it. 
Sometimes they form abrupt kinks (Figure 6-49),'”° but 
often the bend induced has a gradual curvature following 
the curvature of the surface of the globular protein.” 
In the complex with the repressor from bacterio- 
phage 434, the DNA is bent in an irregular arc with a 
radius of curvature of 6.5 nm closely cleaving to the sur- 
face of the globular protein (Figure 6-50).““°** In human 
DNA-(apurinic or apyrimidinic site) lyase, a rigid, 
cationic, preformed surface acts as a template upon 
which the DNA that the protein recognized is bent.’ 
Sometimes two successive segments of DNA that contain 
sequences recognized by two different proteins, which in 
turn form a complex with each other, bend smoothly 
around that pair of proteins.“ The ultimate extrapola- 
tion of such complexes that smoothly bend DNA is that 
found in chromatin between the segment of DNA 150 
base pairs in length and the complex of eight histones 
around which the DNA wraps in a smooth superhelix 
with a radius of curvature of 4.3 nm and two almost com- 
plete turns.” 

Often the bending induced by the protein serves a 
purpose. The superhelices of double-helical DNA in the 
complexes with the octamers of histones that constitute 
chromatin store the DNA compactly. The separate com- 
plexes between the two adjacent sites on the DNA and 
the homeodomain protein MATo2 and the MADS-box 
protein MCM1 bring the two proteins together so they 
can interact.“ It has been proposed, however, that the 
bending or distortion of the DNA in other instances con- 
tributes to the recognition of its particular sequence by 
the protein. 

Itis believed that particular sequences of nucleotides 
are more prone to distortion than others and that the ease 
with which a double helix of DNA distorts can be recog- 
nized by the protein as it bends the DNA during the for- 
mation of the complex. The standard free energy required 
to distort a segment of DNA from the linear B form should 
be positive and unfavorable. If the standard free energy for 
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a particular distortion of a particular sequence is signifi- 
cantly less positive than the standard free energies for same 
distortion of other sequences, then when a complex is 
formed that requires this distortion, the free energy of for- 
mation of the complex will be more negative when the 
easily distorted sequence is bound. 

There are experimental observations indicating 
that a base pair between adenine and thymine is more 
flexible than one between guanine and cytosine and that 
this susceptibility to distortion can be used to recognize 
this base pair.“ In the complex between the repressor 
protein CI of bacteriophage 434 and its complementary 
DNA (Figure 6-50), the central six pairs of bases are not 
part of the sequences on either side that are indispensa- 
ble for the recognition, but their sequence also deter- 
mines the magnitude of the dissociation constant 
between protein and DNA.” When they are 
adenine-thymine pairs rather than guanine-cytosine 
pairs, the free energy of formation of the complex is more 
negative. In the crystallographic molecular model of the 
complex, this region of the DNA is significantly distorted 
bythe protein in amanner that seems as though it should 
be more readily tolerated by adenine-thymine pairs than 
it would be by guanine-cytosine pairs JD) The DNA 
mismatch repair protein MutS from E. coli seems to take 
advantage of the instability of double-helical DNA at a 
position where the bases are mismatched to introduce a 
kink at such a location.’°*®! The uncomplexed segment 
of DNA recognized by the trprepressor of E.coli is 
already distorted in the direction in which it will be dis- 
torted by the complex but is further distorted when the 
complex forms. It is thought that the partial distortion of 
the uncomplexed DNA demonstrates the susceptibility 
ofthis sequence to the ultimate distortion to which it will 
be submitted.” It is also thought that the decrease in 
free energy of formation observed when the N6 anilino 
group of an adenine in the segment of DNA recognized 
by EcoRI site-specific deoxyribonuclease is deleted 
results from an increase in the ease with which this seg- 
ment can be distorted by the protein.“ 

Just as the DNA is often distorted upon forming a 
complex with a protein, the protein often has a different 
conformation in the complex than in solution. Such con- 
formational changes are often significant. For example, 
the carboxy-terminal ahelix of BamHI site-specific 
deoxyribonuclease unwinds, and the disordered 
polypeptide that results turns almost 180° to enter the 
minor groove of the DNA Usually, however, the con- 
formational change ofa protein on binding to DNA is the 
establishment of structure from a disordered segment of 
the polypeptide or the tying down of flexible segments 
of the protein*® by their association with the DNA. 

There are several protein, such as 
topoisomerase I,“ the human Ku heterodimer,“ and 
protein gp 45 from bacteriophage T4,“ that have a hole 
passing through them large enough to contain a double 
helix of DNA. These are proteins that are required to sur- 
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round a double-helical molecule of DNA to perform their 
functions, and they recognize the double helix in part by 
its fit to the hole. In the empty state, the ring of protein 
around the hole is continuous but always contains at 
least one interface through which the polypeptide does 
not pass. It is at such an interface that the ring of protein 
splits apart to allow the DNA to enter the hole and then 
closes back around it.“°”* 

There are also proteins that bind to single-stranded 
DNA. Unlike double-helical DNA, the structure of single- 
stranded DNA is undefined, but when it is bound by one 


cleaves to the surface of a globular protein.*™ The 


complete crystallographic molecular model (Bragg 
spacing = 0.25nm) of the complex between the 


Figure 6-50: Gradual bend produced in DNA as it 
DNA-binding portion (Serine 1 to Arginine 69) of the 
repressor protein CI from bacteriophage 434 and a 
segment of double-helical DNA 20 base pairs in 
length containing the sequences recognized by the 
repressor is displayed. The protein is formed from 
two identical folded polypeptides, each of which 
participates in an extensive interface with the 
double-helical DNA that produces the adhesion of 
the DNA to the surface of the protein. Skeletal struc- 
tures of the two subunits (above and below each 
other) are presented with both the backbone (thick 
lines) and the side chains (thin lines) drawn. Note 
the arginines inserted into the minor groove and the 
æ helices inserted into the major groove. This draw- 
ing was produced with MolScript.-” 
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of these proteins, it assumes a structure dictated by that 
protein. In this sense, the binding of single-stranded 
DNA is no different from the binding of any other large 
flexible ligand by a molecule of protein. Replication pro- 
tein A is a protein that recognizes segments of single- 
stranded deoxyribonucleic acid with many different 
sequences. In the crystallographic molecular model of 
the complex between single-stranded octadeoxycytidine 
and human replication protein A, the octanucleotide in 
the complex is stretched into a linear form that is almost 
fully extended.*” The protein forms a number of hydro- 
gen bonds with the phosphoryl oxygens of the backbone 
similar to those in complexes between double-helical 


DNA and proteins that recognize DNA nonspecifically. 
The hydrogen bonds to the donors and acceptors of the 
individual bases in the particular complex that was crys- 
tallized, however, are thought to arise only from the 
requirement that these must be occupied somehow to 
avoid losing hydrogen bonds from the solution upon 
association of the DNA with the protein. There are suffi- 
cient acceptors and donors of sufficient flexibility on the 
protein in these locations to satisfy any sequence of 
bases, as replication protein A is required to do. 

The novel feature of this complex is the interactions 
between systems of amino acid side chains on the pro- 
tein and the x systems of the bases that are no longer 
enclosed within the core ofa double helix. Phenylalanines 
238 and 269 sandwich one stacked pair of cytosines, and 
Tryptophan 361 and Phenylalanine 386 sandwich another 
(Figure 6-51) 9 Both Phenylalanine 238 and Tryptophan 
361 have their z aromatic systems normal to those of the 
cytosines. An almost identical sandwich occurs between 
aphenylalanine and a tryptophan in single-stranded DNA 
binding protein from E.coli and a stacked pair of 
cytosines.”” Similar sandwiches occur between tyrosines 
in the telomere end-binding protein of Oxytricha nova 
and pairs of stacked guanines.*” A different arrangement 
is found in the same complex between single-stranded 
binding protein from E. coli and single-stranded DNA, in 
which another tryptophan is surrounded by a cluster of 
four cytosines. “4-47 

Just as the paradigm of the structure of DNA is the 
double helix (Figure 3-9), the paradigm of the structure 
of RNA is a molecule of transfer RNA (Figure G50)" 
There are several novel structural features of RNA that 
are not encountered in DNA. 

Although there are two double helices in the crys- 
tallographic molecular model in Figure 6-52, a horizon- 
tal one at the top of the structure containing 13 pairs of 
bases and a vertical one at the bottom containing 5 pairs 
of bases, neither is formed from two separate strands of 
RNA because the entire molecule is formed from only 
one strand of RNA.* The vertical double helix of 5 pairs of 
bases is formed from the two uninterrupted tines of a 
double-helical hairpin of RNA. At the bottom of the hair- 
pin there is a loop in this RNA of 7 bases (2’- 
O-Methylcytosine 327 to Adenine 38). In transfer RNA, 
this loop displays the anticodon, but in most double-hel- 
ical hairpins, this loop has only a structural function. One 
of the most common sequences in such a loop in which 
the chain reverses is UUCG, which produces a structure 
referred to as a tetraloop containing five hydrogen bonds 
that efficiently change the direction of the RNA.*® A 


* You should trace the polynucleotide backbone (O-P-O-C- 
furanose-O-P-O-C-furanose-O-...) through the whole molecule 
of transfer RNA. 

+ Just as some proteins are posttranslationally modified on their 
side chains, all transfer RNAs are posttranscriptionally modified on 
many of their bases.*°*” 


double-helical hairpin and its loop is one of the basic 
structures formed by RNA. 

Double-helical hairpins of RNA can be as long as 50 
or more pairs of bases, but they are usually interrupted 
one or more times with bulges at which there is a mis- 
match of the bases on the two strands that face each 
other. The mismatch causes an interruption in the 
double helix. A bulge can be as small as one or two extra 
unmatched bases that protrude out of the double helix 
on one of the strands while the strand on the other side 
contains no mismatched base. Uracil 59 and Cytosine 60, 
found between the 12th (Guanine 53 and Cytosine 61) 
and the 13th (5-Methyluracil 54 and 1-Methyladenine 
58) pairs of bases of the horizontal double-helical hairpin 
in Figure 6-52, form such a small bulge immediately 
before the loop of three bases (Pseudouracil 55 to 
Guanine 57) following the 13th base pair. Bulges can also 
occur across from each other on both strands of double- 
helical RNA. The number of bases on one strand of such 
a bulge can be the same as or different from the number 
of bases on the other strand, and the two strands are usu- 
ally independent of each other until they rejoin in the 
double helix at the other end of the bulge. An inconse- 
quential bulge of one mismatched pair of bases occurs at 
Guanine 4 and Uracil 69 in the horizontal double-helical 
hairpin in Figure 6-52. 

The most interesting bulges, however, are the larger 
ones. For example, the entire lower portion (Uracil 8 to 
Cytosine 48) of the transfer RNA in Figure 6-52 is a bulge 
out of the horizontal double-helical hairpin. It protrudes 
between the seventh pair (Uracil 7 and Adenine 66) and 
the eighth pair (5-Methylcystosine 49 and Guanine 65) of 
bases while the opposite strand of the horizontal double 
helix does not skip a beat. The returning strand of this 
bulge picks up the beat at the eighth pair of bases that 
was dropped by the departing strand at the seventh pair. 

In the central region of a molecule of transfer RNA 
(Uracil 8 to N’,N’-Dimethylguanine 26 and Adenine 44 to 
Cytosine 48 in Figure 6-52), the polynucleotide strand 
meanders randomly through the region, forming a com- 
plex tertiary structure that rigidifies the molecule and 
holds the two double helices perpendicular to each 
other. In this region, there are numerous intramolecular 
hydrogen bonds orienting the strand of RNA as it passes 
through. It is in this central region that the RNA becomes 
almost reminiscent of a molecule of protein. 

There is, however, one unique characteristic of this 
central region in which the structure is unmistakably 
nucleic acid. Even though the structure has become 
random meander, most of the bases are still stacked one 
upon the other as they are in the double-helical regions. 
Even the last two bases to the right, beyond the end of the 
horizontal double-helical hairpin, are stacked upon 
themselves and the last base of the double helix. 

These novel features—double-helical hairpins, 
bulges, rigid random meander, and stacking of nonheli- 
cal bases—are regularly found in molecules of RNA. 
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Unlike DNA, which is rarely unassociated with pro- 
tein, there are species of RNA such as transfer RNA and 
messenger RNA that spend at least a part of their lives 
free in solution. Unlike DNA, in which the proteins with 
which it is associated change dramatically as it is trans- 
ferred from storage, to transcription, to replication, and 
to recombination; complexes between RNA and protein, 
such as ribosomes and the small nuclear ribonucleopro- 
tein particles that form spliceosomes, often have fixed 
structures that remain essentially unchanged during 
their lifetimes. Such ribonucleoproteins are biologically 
distinguished from proteins only by the fact that they 
almost always operate on other molecules of RNA. 

Many of the atomic details of the association 


6-52: Crystallographic 


molecular model (Bragg spacing > 
0.19nm) of phenylalanyl transfer 
RNA from S. cerevisiae.“ The 
structure is formed from a single, 
folded molecule of RNA 76 bases 
long. The last two bases, which are 
fully extended to the right of the 
upper double helix, have been omit- 
ted. Fourteen of the bases in this par- 
ticular transfer RNA have been 
posttranslationally modified by 
reduction, methylation, and isomer- 
ization,*”**”? but the modifications 
are difficult to spot at this magnifica- 
tion. This drawing was produced 
with MolScript.°” 


Figure 
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between proteins and RNA are indistinguishable from 
those for the association of proteins and DNA. There are 
hydrogen bonds formed to the acceptors on the phos- 
phoryl oxygens and the donors and acceptors on the 
bases, and molecules of water are participants in these 
networks of hydrogen bonds.“?!#° The main difference 
is that many of the bases are not in pairs, and in those 
bases all of their donors and acceptors of hydrogen 
bonds are available for recognition by acceptors and 
donors on both the side chains and the polypeptide 
backbone of the protein. Hydrophobic contacts are also 
made, often with the exposed zsystems of the bases, 
which are accessible in the regions of the RNA that are 
not double-helical.“ There are even instances of 
æ helices lying in grooves of the RNA.“ 

When a large globular molecule of RNA such as a 
transfer RNA is bound in a transient complex by a protein 
such as an aminoacyl-tRNA ligase, the complex is remi- 
niscent of one between a protein and double-helical 
DNA in that the surface of the protein and the surface of 
the RNA in the interface fit together as cast in mold.'?”*® 
In the more permanent complexes, however, the RNA 
and protein are more intimately intertwined. For exam- 
ple, in the U1 small nuclear ribonucleoprotein particle, a 
representative component of the spliceosome, 10 sepa- 
rate proteins form a complex with one molecule of RNA 
in which some individual proteins and multimeric com- 
plexes of other proteins associate with different seg- 
ments of the RNA.“ The RNA contains four 
double-helical hairpins in an open, extended structure, 
and the proteins bind to the ends of individual hair- 
pins or to the double-helical portions emerging from 
the center of the molecule of RNA.“ In such a small 
nuclear ribonucleoprotein particle, only 20% of the mass 
is RNA, and the RNA is a loose scaffold that ties together 
the proteins, which are responsible for most of the struc- 
ture of the particle. 

The ultimate complex between protein and RNA is 
a ribosome. A ribosome is a ribonucleoprotein that is by 
mass about two-thirds RNA and one-third protein. It 
contains three different molecules of RNA, about 3000, 
1500, and 120 bases long, respectively, and about 50 dif- 
ferent molecules of protein, totalling about 7200 aa, the 
largest about 350 aa long, the smallest about 50 aa long.* 
There are two different subunits comprising a ribosome. 
The 50S subunit contains the largest and smallest mole- 
cules of RNA and 30 of the proteins; the 30S subunit con- 
tains the RNA 1500 bases long and 20 of the proteins. 
Yonath and her associates have obtained crystals of the 
50S subunit from Haloarcula marismortui” - and the 
30S subunit from Thermus thermophilus,“ and Yusupov 
and his associates have obtained crystals of the intact 
ribosome from T. thermophilus and the 30S subunit 
from T. thermophilus.***’ All of these crystals have 


* The uncertainty reflects both experimental ambiguity and differ- 
ences among species. 


proven satisfactory for crystallographic studies, and crys- 
tallographic molecular models have been obtained from 
data sets gathered from them.’*° These crystallo- 
graphic molecular models provide the atomic details of 
the structure of a ribosome as well as insight into its abil- 
ity to translate messenger RNA into protein.” 

As the distribution of mass suggests, the basic 
structural element of a ribosome is the RNA. The 4600 
bases of the three molecules of RNA form a globular 
structure with which the proteins associate. The RNA, 
although it is 60 times larger, is reminiscent ofa molecule 
of tRNA and displays all of its characteristic features: 
double-helical hairpins, loops, bulges, and random 
meander. One of the few novel features is that many of 
the hairpins are so long that they form smoothly curved 
double helixes that wrap around other curved double- 
helical hairpins in structures reminiscent of coiled coils. 

For the most part, the various proteins are found 
associated with the outer surface of the much larger 
globular RNA. Many of the proteins are entirely globular, 
but some of them have long segments of polypeptide, 
either interior loops or segments at their carboxy-termi- 
nal or amino-terminal ends, emerging from their globu- 
lar portions and meandering widely through the RNA. 
Some of the globular portions of these proteins sit upon 
the surface of the RNA, others are buried within it, but all 
are subordinate to it both structurally and functionally. 

The RNA is responsible for the ability of a ribosome 
to translate messenger RNA into protein. The RNA of the 
30S subunit aligns the codon of the messenger RNA with 
the anticodon of the transfer RNA,°” and the RNA of the 
50S subunit appears to catalyze” the formation of the 
peptide bond from aminoacyl transfer RNA and the pep- 
tidyl (DNA 

There is a set of small modules of protein known as 
zinc fingers that recognize specific sequences of double- 
helical DNA mainly by forming bonds within the major 
groove (Figure 6-53).°” Each of the many different zinc 
fingers is capable of recognizing the specific sequence of 
a segment of DNA 3-4 bases in length, and each recog- 
nizes a different sequence.’”””!? Sets of these modules 
are strung together within the same polypeptide and 
together recognize longer specific sequences in DNA. 
Four of the five zinc fingers in the zinc finger protein 
GLIl from Homo sapiens (Figure 6-53) together recog- 
nize a segment of DNA 14 bp long by binding consecu- 
tively to segments 3-4 bp in length.” 

Transcription factor IIIA has nine successive zinc 
fingers sequentially joined together within a segment of 
its overall sequence.’'"”'” These nine zinc fingers 
together associate with a segment of DNA 55 bp long but 
directly recognize sequences only in a segment 11 bp 
long beginning 8 bp from one end of the overall segment, 
asegment 10 bp long beginning 9 bp from the other end, 
and a segment 3 bp long in the middle. The first three 
zinc fingers each associate with overlapping sequences 
4 bp long in the segment 10 bp long, the fifth zinc finger 


associates with the sequence 3 bp long in the middle, 
and the last three zinc fingers associate with the segment 
11 bp Jong "7 Side chains from the various fingers asso- 
ciate with the phosphoribosyl backbone outside of the 
three segments the sequences of which are recognized. 
The fourth and the sixth zinc fingers do not enter the 
major groove and consequently do not recognize and 
bind to sequences of base pairs. 

Each zinc finger is a segment of polypeptide about 
30 amino acids long. Ordinarily a segment of polypeptide 
this short would be unable to form a specific structure 
because the small size would prevent the folded protein 
from removing a sufficient number of hydrogen-carbon 
bonds from contact with the water to provide enough of 
a hydrophobic effect to overcome the change in standard 
entropy required for folding.” The most common solu- 
tion to this problem is that a small protein or small 
module of protein will contain several cystines in its core, 
the cross-links of which provide sufficient rigidity to the 
structure to overcome this deficit in standard free energy. 
The zinc finger solves the problem in a similar way, but 
instead of using cystines, it uses a Zn” cation that forms 
four covalent bonds with two cysteines and two his- 
tidines in the sequence of the module (Figure 6-53), 
cross-linking the four amino acids together: 
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Consequently, a zinc finger is an interesting example of a 
metalloprotein. 
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Problem 6-12: What is the sequence of the segment of 
DNA in Figure 3-92 


Problem 6-13: List the hydrogen bonds between the 
amino acids of the zinc finger in Figure 6-53 and the 
bases in the DNA with which it is associated. Identify the 
the amino acids and bases by their respective positions 
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in the sequences and the functional groups of each by 
their names and their numbers (2-9, 2-10, 2-11, and 
2-12 and Figure 4-14). 


geometry. This drawing was 


double-helical DNA that it recognizes.°° The crystallographic 


molecular model (Bragg spacing > 0.26 nm) is that of the complex 
between the five consecutive zinc fingers (Valine 232 to Alanine 
391) of the human zinc finger protein GLI] expressed as a separate 
length containing sequences recognized by those five fingers. Only 
the fifth zinc finger (Proline 361 to Glycine 388) and the four pairs 
tion to one pair of bases on each side (dG-dC and dA-dT, respec- 


tively) are included in the figure. The view is down the axis of the 
B conformation of the DNA. The DNA is in the bottom of the figure 


and the zinc finger in the top. The polypeptide is numbered 
according to the complete amino acid sequence of the zinc finger 
acceptors projecting into the major groove of the DNA, but none of 


the responsible hydrogen bonds has been drawn. In the protein 
sphere near the top of the finger forming four tetrahedrally 


arranged covalent bonds (dashed lines) with Cysteines 364 and 369 
Zn** would have formed covalent bonds with the same four ligands 


to produce the identical tetrahedral 


about 10% shorter than that of Co” and it is a somewhat softer acid, 
produced with MolScript.’” 


protein and a segment of double-helical DNA 21 base pairs in 
of bases [d(GACC) paired with d(GGTC)] that it recognizes in addi- 
protein GLI1. Side chains from the protein read the donors and 
crystallized, the Zu" had been replaced by Co”. The Co” is the gray 
and Histidines 382 and 387. Although its van der Waals radius is 


Figure 6-53: A zinc finger bound to the major groove of the 
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Metalloproteins 


As does a zinc finger, many proteins incorporate one or 
more metallic cations into their structure. Aside from 
cations such as lithium, sodium, potassium, rubidium, 
magnesium, and calcium that are dissolved in the cyto- 
plasm or the extracellular solution and bind adventi- 
tiously and randomly over the surface of a protein, there 
is a set of metallic cations that participate as specific and 
necessary structural and functional elements of metallo- 
proteins. These are the cations of sodium, potassium, 
magnesium, calcium, vanadium, manganese, iron, 
cobalt, nickel, copper, zinc, molybdenum, and tungsten. 
The nontransition metals, sodium, potassium, magne- 
sium, and calcium, and zinc, a transition metal inactive 
in oxidation-reduction, occur exclusively in their most 
common oxidation states: Na, K*, Mei", Ca", and Zn”, 
respectively. Because of the availability of two or more 
readily accessible oxidation levels, the other transition 
metals, for example, iron or copper, are often used as 
one-electron carriers, and in this role alternate between 
oxidation levels, for example, Pei" and Fe** or Cu" and 
Cu’*. In other situations transition metals, such as the 
Ni” in urease or the Pe" in myoglobin (Figure 4-18), fill 
roles in the active sites of enzymes in which no changes 
in oxidation level are required and in fact are to be 
avoided. 

Eventually, the role of a metallic cation in main- 
taining the structure of a protein must be distinguished 
from its role as a catalytic functional group in its active 
site. Aspartate carbamoyltransferase from E. coli is a pro- 
tein constructed from two different folded polypeptides, 
the regulatory subunit (n,a = 152) and the catalytic sub- 
unit (n,a = 310). In the crystallographic molecular model 
of aspartate carbamoyltransferase, a Zn”* is tetrahedrally 
coordinated to the four sulfurs of Cysteines 109, 114, 137, 
and 140 in the regulatory subunit.” This Zn** forms a 
tetrahedral, covalent complex with the structure 


2- 
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that resembles closely structures observed in polynu- 
clear clusters that form between Zn** and small mercap- 
tans. When the zinc is displaced from the thiols by 
organic mercurials, the two subunits separate from each 
other but can be reassociated’'%°” by reincorporating 
the Zn“. In the crystallographic molecular model, the 
Zn” is located adjacent to the boundary between the reg- 
ulatory subunit and its neighboring catalytic subunit but 
distant from both the active site of the enzyme located on 


the catalytic subunit and sites for binding ligands on the 
regulatory subunits. Therefore, the role of the Zn” is 
entirely structural. Its complexation with the thiols cre- 
ates and stabilizes the proper structure at the surface of 
a regulatory subunit. Only when this stable structure is 
formed can the properly folded regulatory subunit asso- 
ciate with a complementary structure on the surface of a 
catalytic subunit, just as only when the proper structure 
of a zinc finger is formed by the binding of the Zn“ can it 
associate with the proper site on DNA (Figure 6-53). A 
metallic cation fulfills such a structural role because the 
bonds it forms either covalently or ionically with lone 
pairs of electrons on bases within the protein are strong 
ones, especially when the protein itself assists in orient- 
ing the bases advantageously. 

In the case of aspartate carbamoyltransferase, 
removal of the Zn” from the protein produces catalytic 
subunits with full enzymatic activity. In most instances, 
however, the removal of a metallic cation from a protein 
leads to loss of function, and separating the effect of the 
cation on the structure of a protein, which itself is 
responsible for that function, from a direct effect at an 
active site, in which a metallic cation is often a catalytic 
group, is difficult. For example, when mammalian liver 
arginase, which is formed from four identical folded 
polypeptides, is treated with the chelating agent 
N,N,N’,N’-tetracarboxymethyl-1,2-diaminoethane (5-1), 
it loses all of its enzymatic activity.°'® At the same time, 
however, it dissociates into individual folded polypep- 
tides. When Mn” is added to the inactive protein, enzy- 
matic activity is regained, but the individual folded 
polypeptides reassociate. It was possible that the dissoci- 
ation of the tetramer was responsible for the inactivation 
and that the Mn” required for activity was necessary to 
retain the proper structure of the protein rather than as a 
catalytic group in the active site. If, however, a crystallo- 
graphic molecular model of the protein is available it is 
possible to determine, as was the case with both a zinc 
finger and aspartate carbamoyltransferase, whether or 
not the metal is distant from sites involved in the func- 
tion of the protein and consequently performs purely a 
structural role. In the case of arginase, for example, it has 
been shown crystallographically that Mn?* cations form 
a binuclear cluster within the active site intimately 
involved in the catalysis performed by the enzyme.” 

The metallic cations incorporated into proteins in 
aqueous solution, because they are themselves Lewis 
acids, are at all times surrounded by Lewis bases. The 
strongest Lewis bases present in biological fluids are the 
lone pairs of electrons on oxygens, nitrogens, and sulfurs 
and the chloride ion. The proton is also a Lewis acid, and 
in biological fluids every acidic proton is usually sur- 
rounded by lone pairs of electrons on oxygens, nitrogens, 
or sulfurs. The proton is so small, however, that it can 
accommodate directly only two Lewis bases at a time in 
one hydrogen bond. Because metallic cations have core 
electrons, they are larger than a proton and can accom- 


modate more Lewis bases simultaneously. The metallic 
cations incorporated into proteins are always sur- 
rounded by pairs of electrons from oxygens, nitrogens, 
sulfurs, or halides. The atoms providing the lone pairs of 
electrons surrounding a metallic cation are its ligands, 
and the number of ligands surrounding the cation is its 
coordination number. The complexes formed between 
metallic cations and proteins are tetracoordinate, penta- 
coordinate, hexacoordinate, heptacoordinate, octacoor- 
dinate, and nonacoordinate. 

The preferences of a metallic cation for a particular 
type of lone pair of electrons are usually discussed in 
terms of the hardness or softness of the Lewis acid and 
the Lewis base.” The rule is that hard acids prefer hard 
bases and soft acids prefer soft bases. For the divalent 
metal ions of importance to the structure of proteins, the 
series of hardness is Mg” > Ca” > Mn” > Pei" > Co** > Ni” 
> Cu”, Zn”. For the commonly encountered bases, the 
series of hardness is lone pairs on oxygen > lone pairs on 
chloride > lone pairs on nitrogen > lone pairs on sulfur. 
These rankings, for example, are consistent with the fact 
that calcium ion has a strong preference for oxygen lig- 
ands while zinc ion has a preference for thiol ligands. It 
also explains why the ligands on metallothionein, a pro- 
tein responsible for chelating and thus removing from 
solution soft, toxic heavy metal cations, are entirely the 
thiols of cysteine side chains in the protein. 

The Lewis bases surrounding a metallic cation in 
solution are attached to it by bonds the characteristics of 
which span the spectrum between ionic and covalent. 
The bonds between hard cationic Lewis acids and hard 
Lewis bases are usually ionic, and those between soft 
cationic Lewis acids and soft Lewis bases are usually 
covalent. The calcium dication is an example of a hard, 
purely ionic metallic cation. In biological solutions, its 
ligands are invariably oxygen atoms,” the hardest of 
bases, and those oxygens are held by ionic bonds. The 
zinc dication in the crystallographic molecular model of 
aspartate carbamoyltransferase (6-20) is a good example 
of a metallic cation participating in covalent bonds. In 
this arrangement, a soft metallic cation, Zn", is bonding 
covalently to four soft bases, (RS, Soft bases such as 
sulfur or even nitrogen are rarely found as ligands to hard 
metallic cations such as Na’, K*, Mg”, and Ca”, but one 
way that a protein reconciles the steric difficulty of 
arranging ligands precisely enough to form unhindered 
covalent bonds with soft metallic cations such as Pe", 
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Cu”, and Zn” is to use hard bases such as oxygen or 
nitrogen as ligands. The bonds formed by these harder 
ligands, because they have more ionic character, are 
more flexible in their angles. Regardless of whether the 
bonds are ionic or covalent, their lengths are usually gov- 
erned by the size of the ion, which is reflected in its ionic 
radius (Table 6-7). 

The major structural difference between ionic 
bonds and covalent bonds is the directional properties of 
the arrangements of the ligands. Ionic bonds are created 
by the electrostatic forces between the metallic cation 
and an anion or a dipole on a ligand. If the forces holding 
the ligands are entirely ionic, the number and orienta- 
tion of the Lewis bases around the cation are determined 
solely by steric considerations. In the case of Ca”, the 
number and orientation of the oxygens surrounding the 
dication depend entirely on the size and shape of the 
functional groups that provide them.” When ligands are 
bonded ionically, the larger the cation or the smaller the 
bases, the more ligands will be gathered. Covalent bonds 
result from the overlap of atomic orbitals to form bond- 
ing molecular orbitals. Because the degree of overlap 
determines the strength of the bond and because the 
degree of overlap depends on the bond lengths and bond 
angles, covalent bonds are characterized by specific 
bond lengths and bond angles. Because zinc is a dj, tran- 
sition metal, its 2d shell is filled. As a result, the covalent 
bonds between Zn** and sulfur in the regulatory subunit 
of aspartate carbomoyltransferase are formed from 
sp’hybrid orbitals on the zinc, and this produces the 
usual tetrahedral arrangement. A similar tetrahedral dis- 
position is assumed by the four Lewis bases around the 
Zn” in a zinc finger (Figure 6-53). Covalent bonds posi- 
tion the participating atoms in strict geometric orienta- 
tions while ionic bonds are completely malleable, 
resembling pigs at a trough. 

The fact that covalent bonds involving metals are so 
rigid is reflected in the practice during crystallographic 
refinement of considering them as fixed geometrically as 
the bonds involving carbon, nitrogen, and oxygen. For 
example, in the initial crystallographic molecular model 
of aspartate carbamoyltransferase built directly from the 
unrefined map of electron density, the arrangement of 
the four sulfurs around the Zn”* was restrained to a tetra- 
hedral geometry, just as an sp’ carbon would have been, 
and this geometry was retained in all the subsequent 
refinement.” This practice can be dangerous, however, 


Table 6-7: Ionic Radii and Lengths of Bonds to Ligands of Metallic Cations Found as Structural Elements in Proteins 


Na* Kt Mg” Ca” Mn” Pei" Ni” Cu” Zn” 
ionic radius“ (nm) 0.102 0.138 0.072 0.100 0.083 0.061 0.069 0.073 0.074 
bond length? (nm) 0.22-0.34 0.20-0.21 0.22-0.26 0.20-0.23 0.19-0.23 0.20-0.23 0.20-0.23 


“Tonic radii for the hexacoordinated metallic cation”? and for the dications of transition metals to permit direct comparison. "Values for the metallic cations in crystallo- 


graphic molecular models of proteins from the references cited in the text. 
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particularly if the ligands to the Zn” are harder, less cova- 
lent bases.” In such intermediate cases, various mix- 
tures of ionic and covalent behavior are observed. The 
main indication of such deviations from covalent behav- 
ior is the loss of directional ligation. 

The monovalent cations of sodium and potassium 
are hard Lewis acids and are almost always surrounded 
by lone pairs from oxygen in any situation, as they are 
when they are dissolved in water. There is, however, one 
crystallographic molecular model in which a Na has the 
msystem of a tryptophan as one of its ligands.°”°*?’ 
Because cytoplasm has a high concentration of K and a 
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low concentration of Na’, it is K* that is almost invariably 
incorporated as a structural metallic cation in cytoplas- 
mic proteins. There are examples, however, of cytoplas- 
mic proteins incorporating Na’ during their 
crystallization, in one case at a site formed by five acyl 
oxygens from the backbone and the oxygen of a carboxy- 
late?” and in another at a site formed from two acyl oxy- 
gens from the backbone and a molecule of water forming 
three hydrogen bonds with groups on the protein.” 
Whether or not these sites are occupied by Na" when the 
proteins are in the cytoplasm is unknown. Extra- 
cytoplasmic proteins, such as a-thrombin, however, do 
seem to incorporate Na "27 

In structural sites for K*,°°!*™* the complex can be 
anywhere from tetracoordinate to heptacoordinate. The 
ion is large enough (Table 6-7) to support seven oxygens 
easily, but the final number of ligands is dictated by the 
structure of the protein. The ligands to a structural K* in 
a crystallographic molecular model of a protein are con- 
tributed by as many as three acyl oxygens of the peptide 
backbone, as many as three waters, sometimes a hydroxyl 
from a serine or the acyl oxygen of a glutamine or 
asparagine, and often, but not always, one of the oxygens 
ofa carboxylate from an aspartate or glutamate. For exam- 
ple, in the hexacoordinate structural site for K* in phos- 
phoribosylaminoimidazolecarboxamide formyltransferase 
from G. gallus, the cation is liganded by the hydroxyl 
groups of two serines, the carboxylate oxygen of one 
aspartate, and three acyl oxygens from the backbone 
(Figure 6-54).°” The eight ligands from the protein to the 
two potassium cations bound within the selectivity filter 
of the KcsA potassium channel, however, are all acyl oxy- 
gens from the polypeptide backbone,” and at the entry 
to the selectivity filter a potassium can be observed lig- 
anded by four acyl oxygens from the backbone and four 
molecules of water, presumably in the act of being dehy- 
drated "P In keeping with the size of the potassium cation 
and the hardness of both the cation and the oxygens, the 
bonding is ionic, the geometry of the ligands around the 
cation is unpredictable, and unlike those of most other 
metallic cations, the bond lengths span a large range 
(Table 6-7). 

Calcium ion, a hard Lewis acid like K*, also partici- 
pates in purely ionic bonds with no particular geometric 
requirements. Its smaller ionic radius (Table 6-7) is more 
than compensated by its increased charge, and it associ- 
ates with as many as nine ligands. Calcium ion has a low 
affinity for nitrogen,’ and it can be distinguished from 
Mg” by this characteristic. It is probably the dication that 
is most often bound by proteins, and its role is usually 
entirely structural. 

When bound to proteins, Ca" serves in its struc- 
tural role by gathering around itself oxygens from the 
backbone of the polypeptide, its side chains, and mole- 
cules of water. This exclusive preference for oxygen is 
entirely consistent with its hardness. The octacoordinate 
site on the surface of endopeptidase K (Figure 6-55)" 


that binds Ca” with a dissociation constant of 
8x 10° M is typical of a complex between a Ca” and pro- 
tein. It is representative of such complexes because the 
Ca" is surrounded by molecules of water, one of which is 
positioned by the molecule of protein, by acyl oxygens 
from the backbone and by the carboxylate of an aspar- 
tate that provides simultaneously two oxygens as a 
bidentate ligand. The charge number on the Ca” in this 
site is not matched by that of its ligands, as is also the 
case in the heptacoordinate site for Ca”* in a-lactalbumin 
from Papio cynocephalus, in which the ligands are the 
carboxylates of three aspartates, each a monodentate 
ligand to the Ca", as well as two molecules of water and 
two acyl oxygens from the backbone.” There are also 
structural Ca" that are more completely surrounded by 
the protein, such as the heptacoordinate site in thermi- 
tase,™ which is surrounded by three acyl oxygens from 
the backbone, an acyl oxygen of the side chain of an 
asparagine, a single oxygen of the side chain of an aspar- 
tate, and the two carboxylate oxygens of another aspar- 
tate acting as a bidentate ligand. 

In these complexes, the distance between the Ca" 
and the heteroatoms of the ligands is between 0.22 and 
0.26 nm (Table 6-7). The position taken by the Ca” rela- 
tive to an acyl oxygen of the backbone is remarkably sim- 
ilar to that of an amido nitrogen-hydrogen in a hydrogen 
bond to such an acyl oxygen (Figure 5-11) with a broad 
distribution of angle b from 140° to 180° and a strong 
tendency to lie in the plane of the peptide.™' 

These complexes are usually specific for Ca”. The 
specificity is provided by the distribution of the oxygens 
within the protein and the donors and acceptors of hydro- 
gen bonds between the protein and molecules of water 
(Figure 6-55) retained by the Ca" as it enters the complex. 
The number of ligands provided by the protein and the 
conformation to which they are confined by the rest of its 
structure” permits them to recognize both the two units 
of charge number and the radius of the Ca°*. For example, 
Aspartate 200 in the complex between Ca" and 
endopeptidase K (Figure 6-55)” is positioned properly by 
resting in a groove formed by the backbone and Cystine 
178/249 as well as by a hydrogen bond to an amido 
nitrogen-hydrogen of the backbone. 

A magnesium cation, although also a hard, alkaline 
earth metallic cation, displays directional bonding remi- 
niscent of a covalent-coordinate metallic cation not 
because of covalence but because of the intensity of its 
electrostatic field and the steric effects associated with its 
significantly smaller ionic radius (Table 6-7). Its com- 
plexes are almost always hexacoordinate with the ligands 
in an octahedral arrangement enforced by the severe 
steric effects, and the interatomic distances between the 
metal and the ligands are shorter and much less variable 
(0.20-0.21 nm) than those for calcium (0.22-0.26 nm). In 
a structural role, Mg” is often associated with phospho- 
ryl oxygens, as for example those on nucleic acid. When 
bound to protein or nucleic acid, Mai" usually retains 


Metalloproteins 329 


several of its waters, often all of them.*® For example, in 
the complex between Mg” and inorganic diphosphatase 
from E coli, the Mg” retains all six of its octahedrally 
arrayed molecules of water, each of which forms one to 
three hydrogen bonds with donors and acceptors on the 
protein.” 

It appears from crystallographic and spectra 
observations that Mei", when bound to a protein, prefers 
oxygen ligands exclusively, in particular the oxygens of 
phosphates and carboxylates, but the dication of man- 
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ganese, Mn”, a softer metallic ion that is about the 
same size as Mg” and that gathers its ligands just as 
tightly (Table 6-7), forms complexes with both 
nitrogen and oxygen bases such as ammonia, imidazole, 
1,2-diaminoethane, water, alcohols, carboxylates, the 
carbonyl oxygens of ketones and aldehydes, and the acyl 
oxygens of amides. Consistent with its degree of hardness, 
oxygen bases and nitrogen bases are roughly equivalent 
in their affinities for Mn“. In aqueous solution, the hexa- 
ammine complex is observed only at concentrations of 
ammonia greater than 2 M, but hexaimidazole salts can 
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be crystallized from anhydrous ethanol. Because of its 
small size and degree of hardness, all of the complexes 
between Mn” and such unhindered hard and intermedi- 
ate bases are hexacoordinate and octahedral, reminiscent 
of directional covalent bonds; and in these complexes, 
mixtures of various ligands around manganese can occur. 
For example, each of the species [Mn(OH,),,(NH3) 6] 
with 0 < n < 6 is observed in mixtures of ammonia and 
water. When Mn”* is bound to a protein, it is complexed 
octahedrally by Lewis bases both from amino acids on the 
protein and from molecules of water (Figure 6-56).” 

Iron, when it is found in a protein, is almost always 
in a coenzyme such as a heme (Figure 4-18) or an 
iron-sulfur cluster: 


In most of these instances, the iron acts either as an elec- 
tron carrier because of its ability to convert between Fe(II) 
and Fe(IID or as a catalytic group that can bind and acti- 
vate oxygen, but the iron-sulfur cluster in DNA-(apurinic 
or apyrimidinic site) lyase of E coli,’ as well as the one 
in glycosylase MutY from E coli,” performs a structural 
role. In a heme the Pe" is hexacoordinate, but in an 
iron-sulfur cluster it is tetracoordinate, consistent with the 
softness of the thiolates. There is also a pentacoordinate 
site fora structural Fei" in UTP-hexose-1-phosphate uridy- 
lyltransferase in which the ligands are the nitrogens from 
three histidines and the two carboxyl oxygens of a gluta- 
mate acting as a bidentate ligand.’ The irregular arrange- 
ment of the ligands in this case is permitted by their 
hardness, which causes the ligation to be more ionic. One 
of the most peculiar sites for an iron cation is that on nitrile 
hydratase of Rhodococcus, in which the Fe* is liganded by 
two amido nitrogens from the backbone and the sulfurs 
of three cysteines, one of which is oxidized to a sulfenic 
acid and another to a sulfinic acid (Figure 2-8).° 

Cobalt is incorporated into proteins as the metal in 
the center of coenzyme B;,. Nickel is used as a Lewis acid 
in the active site of urease”” and as the electrochemically 
active component in the active site of a few enzymes cat- 
alyzing oxidation—-reduction.*” >” In at least one of these 
latter enzymes, it is found coordinated within a tetrapyr- 
role that resembles a porphyrin.” Molybdenum and 
vanadium are found in nitrogenases and molybdenum 
and tungsten in other enzymes catalyzing oxidation- 
reduction such as nitrate reductase, aldehyde:ferredoxin 
oxidoreductase, and xanthine oxidase. In each of these 
proteins, the molybdenum, tungsten, or vanadium is 
enclosed within a pterin coenzyme that provides the 


thiols coordinating the metallic cation and holding it 
within the protein.” 

Copper exists in biochemical situations as the 
kinetically stable cations Cu* and Cu”. It is usually used 
as a one-electron carrier, often in reactions involving 
oxygen activation such as those catalyzed by monooxy- 
genases. Although it is a soft metallic cation, Cu" can 
form a number of coordination complexes with lone 
pairs from oxygen, nitrogen, and sulfur. The variety of 
these complexes defies categorization. They range from 
dicoordinate to octacoordinate. Even in the more 
common tetracoordinate and hexacoordinate stereo- 
chemistries, the terms used to describe the variations, 
such as square planar, compressed tetrahedral, elon- 
gated tetragonal octahedral, and trigonal octahedral, 
indicate that the arrangement of many of the Lewis bases 
around copper, as with calcium, is governed mainly by 
steric effects among the ligands, rather than by covalent 
bonding. Examples of complexes between Cu" and 
simple biochemical ligands are [Cu(NHs) Dien 


NH3 on 


Cu 
HN Tac 
re NH3 
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in which four of the nitrogens are equivalent and the 
fifth forms a longer bond to Cu”, and 
[Cu(imidazole),(OH,).]** and [Cufformate),(OH,),]”", 
which are both elongated tetragonal octahedral struc- 
tures. Simple thiols such as mercaptans reduce Cu” to 
Cu" and form complex polymeric structures with Cu’ in 
which the coppers are multidentate and the thiols are 
bidentate. 

The azurins and the plastocyanins are related met- 
alloproteins involved in one-electron transfers in which 
the single copper passes reversibly from Cu” to Cu to 
carry the electron. In crystallographic molecular models 
of these proteins,””” the copper is coordinated to two 
histidines, a methionine, and a cysteine with no particu- 
lar geometric regularity. In the apoprotein,* the two 
nitrogens and the two sulfurs that surround the copper in 
the holoprotein assume the same orientations even 
though the copper is not present.’ Consequently, 
unlike the situation in a zinc finger in which the Zn”* dic- 
tates the structure assumed by the protein, azurins and 
plastocyanins are large enough proteins that they dictate 
the stereochemistry of the ligation. 

The most versatile metallic dication performing 


* The metal-free form of a metalloprotein is the apoprotein, and 
the form of the metalloprotein when it contains the metal is the 
holoprotein. 
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structural roles in proteins is that of zinc. Its versatility in 
this role arises from its ability to form both tetracoordi- 
nate and pentacoordinate complexes with Lewis bases 
and its ability, even though it is one of the softest metal- 
lic cations, to form complexes with lone pairs from 
oxygen and nitrogen, as well as sulfur. Often a mixture 
of two or three of these rather different bases forms the 
site on a metalloprotein for the Zn**. When it is bound by 
four sulfurs, which are soft bases complementary to the 
soft Zn**, the bonds are covalent and tetrahedral. The 
complex between Zn°* and the harder base ammonia is 
tetracoordinate [Zn(NH;),]** and tetrahedral, probably 
because it is also covalent. This tetrahedral covalent 
form of Zn** is the most common and is observed when 
Zn** forms complexes with 1,2-diaminoethane, cyclic 
lactams, and imidazole. As the ligands become harder, 
however, geometries become more variable. For exam- 
ple, the complex [Zn(OH,),]”* between Zn** and water, a 
hard base, is hexacoordinate and octahedral; but as pro- 
tons are removed, it eventually decreases its coordina- 
tion to four, as [Zn(OH);(OR;)], as a result of 
electrostatic repulsion. An ionic, octahedral complex 
forms with carboxylates: 


0 


OPH 
H 
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Zinc dication forms a pentacoordinate complex with, 
among other ligands, 8-aminoquinazoline,® in which 
the four nitrogens from two aminoquinazolines and a 
molecule of water are the five Lewis bases that generate 
the complex [Zn(N»CoH,)(OH»)]**. 

A pentacoordinate complex is formed by the struc- 
tural Zn” in the periplasmic zinc-binding protein TroA of 
Treponema pallidium. In this complex with the protein, 
the Zn”* is surrounded by the two oxygens of the car- 
boxylate of Aspartate 257 as a bidentate ligand and three 
imidazoyl nitrogens from Histidine 46, Histidine 111, 
and Histidine 177 in an irregular arrangement (Figure 
6-57).® Most structural sites for zinc, however, are tetra- 
coordinate, resembling the one in a zinc finger (Figure 
6-53). For example, in the structural sites for zinc in UTP- 
hexose-1-phosphate uridylyltransferase from E coli and 
alanine-tRNA ligase from E. coli, the cations are also sur- 
rounded by two histidines and two cysteines in a tetra- 
hedral array.’ 

There are a number of proteins that contain mod- 
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ules resembling zinc fingers in that they bind to specific 
sequences in DNA and are also zinc metalloproteins. 
Some of them have complexes that resemble the one in 
the regulatory subunit of aspartate carbamoyltrans- 
ferase (4-48) because the zinc forms covalent bonds 
with four cysteines.”®°°! Others have a site formed from 
three cysteines and only one histidine.” Others contain 
clusters formed from two Zn** and the sulfurs from six 
cysteines 


2- 
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that resemble iron-sulfur clusters (6-21) and in which 
each Zu" forms covalent bonds to four sulfurs.°°°® In 
all of these modules, as in the zinc fingers, the cross-link- 
ing of the polypeptide performed by the respective com- 
plex is essential for maintenance of the proper structure. 

Because the distances (Table 6-7) at which the lig- 
ands are held by Mn”, Pe", Ni”*, Co°*, and Zn” in a crys- 
tallographic molecular model are so similar, it is not 
surprising that these metallic cations are often inter- 
changeable in their structural roles. For example, even 
though Zn” is the only cation found in aspartate car- 
bamoyltransferase when it is purified from E coli, Ni” 
and Co” are capable of promoting the proper reassem- 
bly of the apoprotein.” The metal site within the active 
site of aryldialkylphosphatase®” is as happy with NI" or 
Mn” as it is with the Zn” it naturally contains. The diph- 
theria toxin repressor from Corynebacterium diphtheriae 
under normal conditions of growth incorporates an Pe" 
into its structure,” which is required for it to fold prop- 
erly. The structural role of this Pe" can be played just as 
well by NI" or Mn°*.”® In the properly folded form of this 
protein, the metallic cation is octahedrally coordinated 
by the carboxylate oxygen of an aspartate, the carboxy- 
late oxygen of a glutamate, the imidazoyl nitrogen of a 
histidine, the sulfur of a methionine, an acyl oxygen from 
the backbone, and a molecule of water, making this site 
one of the most eclectic (Figure 6-58).°”” 

Sometimes the site for a structural metallic cation 
associates with its metallic cation so weakly that the 
cation is lost during its purification. If several metallic 
cations are as effective at producing its proper confor- 
mation, it is difficult to say with any certainty which is 
used in vivo. 


Suggested Reading 


Lee, Y.H., Deka, R.K., Norgard, M.V., Radolf, J.D., & Hasemann, C.A. 
(1999) Treponema pallidum TroA is a periplasmic zinc-binding 
protein with a helical backbone, Nat. Struct. Biol. 6, 628-633. 


Problem 6-14: The drawing in the figure on the next 
page is of a site for the binding of Mg“ within the crys- 
tallographic molecular model of inorganic pyrophos- 
phatase from E. coli.°“ This drawing was produced with 
MolScript.°” 


Identify the donors and acceptor for each of the hydro- 
gen bonds. 


Figure 6-58: Site for the structural Pei" in the crys- 
tallographic molecular model (Bragg spacing = 


13 and 


Glutamate 105 and the hydrogen bond between 


between Arginine 
Aspartate 6 and the open amido nitrogen-hydrogen 


of diphtheria toxin repressor from 


Corynebacterium diphtheriae” The protein puri- 


fied from the bacterium contains an Fe” at this site, 
center of the cluster of ligands. Presumably, an Pe" 


would gather the same ligands (Table 6-7). The lig- 
ands are provided by side chains and backbone 
from two chelices. The amino-terminal portion of 


the upper «helix and a segment amino-terminal to 
at position 9 at the amino-terminal end of the 


upper «helix. This drawing was produced with 


are drawn. Together they provide all of the ligands 
MolScript.>” 


but that Pei" was replaced with NI" for the crystal- 
lographic study. The NI" is the gray sphere in the 
it (Leucine 4 to Threonine 14) and the carboxy-ter- 
carboxy-terminal to it (Histidine 98 to Valine 107) 
to the metal and donors and acceptors for hydrogen 


minal portion of the lower whelix and a segment 
bonds buttressing the ligands. Note the pair of 


hydrogen bonds 


0.24 nm) 
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Chapter 7 


Evolution 


Although it is mutations in the DNA that produce the 
diversity upon which natural selection operates, it is 
within the proteins encoded by that DNA that most of the 
diversity is expressed. Consequently, natural selection 
accepts or rejects mutated proteins, not mutated genes. 
The two genes encoding the two calmodulins in Arbacia 
punctulata, which arose from the duplication of a single 
gene, differ in nucleotide sequence from each other at 45 
out of 393 positions, but in the two calmodulins them- 
selves, which unlike the genes have been continuously 
scrutinized by natural selection, only two of those 45 dif- 
ferences have been permitted to change the amino acid 
sequence." 

It is within the proteins existing today that the his- 
tory of evolution by natural selection can be read. The 
later episodes of this history are read by comparing the 
amino acid sequences of the same protein from different 
species. Two new species arise from one ancestral 
species as soon as subpopulations of that ancestral 
species become so different from each other that two 
individuals of different sex, one from each of the sub- 
populations, are no longer able to breed successfully. 
Even when two closely related species that have only 
recently diverged from their common ancestor are com- 
pared, the amino acid sequences of the same respective 
proteins from each species will often differ at one or 
more positions. For example, myoglobin from domestic 
sheep differs in amino acid sequence at three of its 143 
positions from myoglobin of domestic goats. Even the 
amino acid sequences of the myoglobins from human 
and chimpanzee differ at one of their 153 positions. 

The reason for this divergence of amino acid 
sequence is that once speciation has occurred and inter- 
breeding becomes impossible, two versions of the same 
protein are established. These two versions begin to 
evolve in isolation from each other, and mutations occur 
atrandom in the respective genes encoding each version. 
Once in a while one of these mutations in one version 
that produces an acceptable change in amino acid 
sequence of the protein is fixed by genetic drift or natu- 
ral selection independent of any fixation occurring in the 
other version, and slowly the amino acid sequences of 
the encoded proteins become different, one position at a 
time. Because the geologic instant at which the two 
species were established from one common ancestral 
species coincides with the instant at which the two ver- 
sions of the same protein began to evolve separately, 


amino acid sequences retain the history of the speciation 
of organisms. This history can be reconstructed by com- 
paring the amino acid sequences of the same protein 
from an array of different species. 

Even in the most advantageous instances in which 
amino acid sequences are compared to each other, con- 
nections can usually be made only as far back as the 
common ancestors of prokaryotes and eukaryotes. What 
has been found, however, is that the tertiary structure of 
a particular protein, when viewed in crystallographic 
molecular models from distantly related species, 
changes less rapidly than its amino acid sequence during 
evolution by natural selection. Because of this, compar- 
isons of crystallographic molecular models permit one to 
look back in evolutionary history to the time at which the 
individual proteins themselves were diverging from 
common ancestors: to the time, for example, when L-lac- 
tate dehydrogenase and glyceraldehyde-3-phosphate 
dehydrogenase or triose-phosphate isomerase and 
indole-3-glycerol-phosphate synthase diverged from 
their common ancestor. Through such comparisons, the 
speciation of proteins can be traced. Because amino acid 
sequences change more rapidly than tertiary structures, 
only a few of the pedigrees of proteins, those that 
diverged recently in geologic time or those in which 
mutations are fixed slowly, can be traced by comparing 
amino acid sequences. Most of our insight into the spe- 
ciation of organisms has come from comparisons of 
amino acid sequences, but most of our insight into the 
speciation of proteins has come from comparisons of ter- 
tiary structures. 

From the comparisons that can be made among the 
tertiary structures that are now available, it has become 
clear that the larger proteins often, if not always, have 
arisen during evolution by the chance fusion of two 
genes encoding smaller proteins, each of which could 
fold independently and each of which usually had an 
independent function prior to the fusion. As a conse- 
quence of such fusions, larger and larger proteins 
appeared. If a particular fusion produced a protein that 
was not impaired functionally, the new gene for the 
larger protein may have been fixed in the population by 
genetic drift; or, if the fusion produced a protein with 
advantageous features, the new gene for the larger pro- 
tein may have been fixed in the population by natural 
selection. The history of these fusions can be observed in 
the existing domains from which these larger proteins 
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are constructed. The domains of a protein are discrete 
regions in the tertiary structure of that protein which 
arose from separate, previously independent proteins 
that were fused together, one after the other, to produce 
the present protein. Because a polypeptide shorter than 
about 70 amino acids usually cannot fold spontaneously 
to form a tertiary structure, domains, when defined in 
this way, are usually larger than this. They appear in the 
crystallographic molecular model as independently 
folded regions. Because they are the fundamental units 
in the evolution of proteins, domains must be identified 
by a set of conservative, objective criteria, if our descrip- 
tion of the evolutionary history of a set of proteins is to be 
accurate. 

It may be possible, by examining enough crystallo- 
graphic molecular models, to trace the ancestry of the 
proteins that presently exist, in a sense to derive a molec- 
ular phylogeny of the proteins. Because most of the exist- 
ing proteins were produced by fusion of smaller units, 
this molecular phylogeny of the proteins must be based 
on a reconstruction of two processes. First, the family 
trees of the individual, ancestrally related domains from 
different proteins must be reconstructed. In almost every 
instance these family trees must be based on patterns in 
which the secondary structures are arranged to form the 
tertiary structures of the domains being compared 
because similarity in amino acid sequence has been 
completely lost. Second, the separate events that pro- 
duced each of the fusions of the independent domains to 
produce the larger chimeric proteins must also be recon- 
structed. 

Although the most interesting question may be how 
the large array of existing proteins arose from a much 
smaller array of smaller proteins present in the distant 
past, it should be stressed that new proteins are continu- 
ously being made by this process of fusion of different 
pieces. We know this because, in some instances, 
domains that have homologous amino acid sequences 
can be found in otherwise completely different proteins. 
Because similarity in amino acid sequence usually disap- 
pears quite rapidly over geologic time, these domains 
must have been separately incorporated into their 
respective proteins fairly recently; the greater the simi- 
larity of amino acid sequence among them, the more 
recently the separate fusions must have occurred. 


Molecular Phylogeny from Amino Acid 
Sequence 


The amino acid sequences of a set of related polypep- 
tides retain a record of the history of their evolution by 
natural selection. That record provides information 
about the speciation of organisms, the specialization of 
tissues, and the conversion of older proteins into newer 
ones. This evolutionary history is read from aligned 
amino acid sequences. 


As the amino acid sequences of the same protein 
from different species have become available, it has usu- 
ally been found that they are similar enough to be read- 
ily aligned with each other. An alignment of two or more 
amino acid sequences is a display in which positions that 
are thought to be directly related to each other from the 
respective sequences are aligned directly above and 
below each other. The decision that the aligned positions 
are related is based on the fact that they are occupied by 
the same amino acid or the fact that they are each sur- 
rounded by similar sequences of amino acids. The 
cytochromes c from human, corn, and yeast can be read- 
ily aligned (Figure 7-1A).”” There is no uncertainty about 
the alignment even though the three proteins are from 
distantly related species. 

The fact that, in most instances, the amino acid 
sequences of the three respective proteins responsible 
for a particular function in humans, yeast, and corn can 
be readily aligned, as can the three cytochromes c, is the 
strongest evidence for the fact that these three species 
share a common ancestor. Consequently, each of the 
proteins that are responsible for a particular function 
and the amino acid sequences of which can be aligned 
also share a common ancestor. Any two proteins that 
have descended from a common ancestor are homo- 
logues of each other. 

In the distant past, when only the common ances- 
tral species was present, all of the individuals in the pop- 
ulation of that ancestral species contained, for all 
practical purposes, a cytochromec the amino acid 
sequence of which was the same, just as all individuals of 
an extant species contain a cytocrome c with the same 
sequence. As natural selection operated upon the genetic 


Figure 7-1: Alignment of amino acid sequences of cytochromes c 
and replacements observed at each of the positions in the common 
sequence. (A) Alignment of ungapped amino acid sequences. The 
three amino acid sequences below the numerical scale are the 
aligned amino acid sequences of the cytochromes c from Homo 
sapiens, Zea mays, and Saccharomyces cerevisiae. The amino acid 
sequence of cytochrome c from Thunnus alalunga is immediately 
above the numerical scale, which is based on this latter sequence. 
Above each position in this top sequence is a list of the other amino 
acids found in this position in a collection of 40 cytochromes c from 
various eukaryotes.” Letters below the horizontal lines in each of 
the columns are variations found among cytochromes c of animals, 
and letters above the horizontal lines are the additional variations 
found in fungi and plants, more distantly related eurkaryotes. 
(B) Insertion of gaps for the purpose of alignment. The two aligned 
amino acid sequences are those for cytochromes c from T. alalunga 
and Paracoccus dinitrificans. Each set of dashes represents a gap 
that must be made in one of the sequences to align it reasonably 
with the other sequence. You should convince yourself that the 
gaps are inescapable. When the two sequences are aligned in this 
way, the size of each gap is determined by the number of extra 
amino acids in the sequence that does not have a gap. (C) Gaps 
visualized as insertions. Instead of introducing gaps to permit the 
alignment shown in panel B, the extra amino acids in each inter- 
vening segment are shown as loops. This presents a more realistic 
picture of the situation but is a significantly more awkward method 
for displaying an alignment. 
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variation present within the population of that ancestral 
species, varieties arose that occupied different ecological 
niches. These varieties eventually diverged sufficiently to 
become separate species. At that point, the genes for 
cytochrome c in these two new species became discon- 
nected, and the amino acid sequences encoded by those 
genes from that time forth were altered independently 
and continuously by mutation, genetic drift, and natural 
selection. As a result of a long series of such disconnec- 
tions, the distantly related species Homo sapiens, Zea 
mays, and Saccharomyces cerevisiae eventually appeared. 
The differences and similarities between the three extant 
amino acid sequences for the three respective 
cytochromes c are the accumulated result of the individ- 
ual steps in this process of speciation. An underlying 
assumption of this description is that a function per- 
formed only by the protein encoded by a certain gene 
remains the exclusive property of the product of that 
gene as it passes from species to species. Although this is 
usually the case, there are isolated examples in which the 
amino acid sequence of a protein from one species 
seems to be unrelated to that of the protein performing 
the same role in another species and must be the result 
of convergent rather than divergent evolution.’ 

Evolution by natural selection is usually viewed 
from its optimistic side. Natural selection operates on the 
variation inherent in any large population of a given 
species of organisms to shift the distribution of its assem- 
bled abilities gradually in a direction that makes that 
species or its descendant species more successful. 
Beneficial traits are patiently nurtured and multiplied. 
The major portion of the variation upon which natural 
selection operates to achieve this progress is variation in 
the sequences of the proteins within the population of a 
given species. 

It is unlikely, however, that more than a small 
number of the differences seen when two aligned amino 
acid sequences are compared (Figure 7-1A) reflect 
improvements in the ability of the individuals of that 
species to survive relative to that of individuals of other 
species or their common ancestors. There is little evi- 
dence that the cytochrome c from either H sapiens or 
Z. mays is an improved version of the cytochrome c that 
was used by their common ancestor or that any of the 
proteins the amino acid sequences of which are being 
presently compared are improved versions. The majority 
of the differences that accumulate in the sequences of 
the same protein in two lineages, following their diver- 
gence from their common ancestor, are neutral replace- 
ments.”° A neutral replacement is a change of one 
amino acid for another that is harmless enough that the 
biological function of the protein does not deteriorate 
sufficiently to cause the elimination of the replacement 
by natural selection. These neutral replacements arise 
from mutations in the DNA encoding the protein. Each 
that is now in existence began as a mutation in the 
genome of one individual and then spread through the 


population of its species, or became fixed, by genetic 
drift. When one views aligned sequences of the same 
protein from different species, one is examining the 
record of this gradual increase in entropy. 

This increase in entropy, however, is biased. From 
examining aligned amino acid sequences of the same pro- 
tein from many species, it is clear that each position in the 
underlying sequence that gives the protein its unique 
character is under a different degree of negative selective 
pressure. Mutations can occur with equal frequency at any 
position in the sequence of the DNA encoding for the 
sequence of a protein. Each of these individual mutations 
is assessed by natural selection, and the majority” disap- 
pear almost immediately because they adversely affect the 
function of the protein or are otherwise deleterious. For 
example, in the human population, there are many 
mutant forms of hemoglobin that bind oxygen improperly 
or are unstable proteins.’ These represent deleterious 
mutations that survive for a limited time before disap- 
pearing from the population. These mutant forms can be 
contrasted with fetal hemoglobin that has been fixed in 
the human population because it is a stable protein and 
has beneficial properties. The most deleterious mutation 
is one that kills the individual in which it arises before that 
individual has had an opportunity to mate or otherwise 
reproduce after the mutation occurs. The more critical a 
particular amino acid in the sequence of the protein is to 
its function, the less prone will that position be to substi- 
tution over time. For this reason, the aligned amino acid 
sequences of the same protein from an array of species 
evaluate the scope of the intolerance to variation 
expressed at each position in the sequence of the protein. 

This record of intolerance can be read from exam- 
ining consecutively each position in the aligned 
sequences of a large collection of the same proteins from 
different species. Above the numerical scale in Figure 
7-1A, the sequence of cytochromec from Thunnus 
alalunga is presented, and above each of its positions in 
acolumn ofletters are tallied the amino acids found there 
in the cytochromes c from 40 other eukaryotes.”” The hor- 
izontal lines in each of these columns of letters separate 
amino acids found in the sequences of cytochromes c 
from animals, of which far more are available, from amino 
acids found in the sequences of cytochromes c from fungi 
and plants, which represent more distant relationships. 
A similar record of intolerance is observed when the 
amino acid sequence ofa particular protein is mutated at 
random and the resulting mutants are selected for their 
ability to function properly.? 

This intolerance to substitution is most strongly 
manifested at an invariant position. An invariant posi- 
tion in a protein is a position at which no replacement 
has been made over the history encompassed by the 
aligned amino acid sequences. A few of the positions in 
the aligned sequences of the cytochromesc have 
remained absolutely invariant, for example, Cysteine 14, 
Cysteine 17, Histidine 18, and Methionine 80 because 


these are functionally irreplaceable and consequently 
define a cytochrome c. Some positions such as those 
occupied by Glycine 6, Glycine 34, Glycine 41, Glycine 77, 
and Glycine 84 are invariant among the eukaryotes but 
are replaced in bacterial cytochromes c. Several of these 
glycines are examples of the fact that glycines with 
angles @ and y outside the boundaries on a Rama- 
chandran plot (Figure 6-4B) are difficult to replace.’ 
Nevertheless, their eventual replacement demonstrates 
that a designation of invariant is always provisional. As 
more and more amino acid sequences of the same pro- 
tein from different species become available, the number 
of invariant positions usually decreases.” 

The fact that any designation of invariant is neces- 
sarily based on a limited set of amino acid sequences 
may explain why site-directed mutation of amino acids 
at apparently invariant positions often has little effect on 
the function of a protein.''"? For example, site-directed 
mutation of five of the 15 highly conserved amino acids 
in lathosterol oxidase had little effect on its function.'° 
Consequently, the intuition that an invariant position 
must be structurally or functionally important is unreli- 
able. The situation is even more confusing when a posi- 
tion known to be functionally critical nevertheless 
displays several replacements even among closely 
related species.” 

Many of the changes accumulating over time seem 
to be conservative replacements. A conservative 
replacement is a replacement at a position in which only 
similar amino acids, either in size or in chemical proper- 
ties, can be tolerated. For example, only valine, 
isoleucine, phenylalanine, and leucine, each of the side 
chains of which is a hydrocarbon, seem to occur in 
position 35 of eukaryotic cytochrome c (Figure 7-1A). 
Either glycine or alanine, the side chains of which are 
small, seems to be necessary in position 29. Either serine, 
threonine, or glutamine, the side chains of which 
are polar but uncharged, seems to be necessary in 
position 42. 

It was once thought that each position in the amino 
acid sequence of a protein could be assigned unambigu- 
ously to one of a few categories, for example, invariant, 
conservative, physicochemically constant, and vari- 
able.'® When it became possible, however, to compare 
the amino acid sequences of the same protein from a 
large number of distantly related species, the majority of 
the replacements observed could not be easily explained. 
This fact suggests that even the specific designations just 
presented may themselves be rationalizations of more 
subtle processes that are not understood. A close exami- 
nation of the actual results, however, sequence position 
by sequence position (Figure 7-1), does produce an intu- 
itive feeling for the play of evolution. 

In addition to the capacity of a particular amino acid 
to be tolerated at a particular position in the sequence of 
a protein, the nature of the genetic code itself also affects 
the patterns in which replacements in the sequence 
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occur during evolution. Because there are three bases 
coding for each amino acid and mutation occurs one 
base at a time, replacements requiring one base change 
should be more common than those requiring two or 
more.’ There are, however, some interesting apparent 
exceptions to this generalization that occur even in the 
comparisons of the various eukaryotic cytochromes c 
(Figure 7-1). For example, at position 31, only asparagine 
and alanine are found; at position 72, only lysine and 
serine; at position 45, only lysine and glycine; and at posi- 
tion 19, only glycine and threonine. Each of these four 
replacements would require that two bases of the respec- 
tive codon be mutated consecutively. Although these are 
unlikely events, the constraints on the occupation of 
these positions seem to have been severe enough to con- 
fine the replacements among the eukaryotes to these 
choices. In the short term, however, the difficulty of 
changing more than one base to effect a replacement is 
more acute. When the detailed history of the mutational 
events that have occurred during the recent evolution of 
artiodactyl fibrinopeptides®®° was examined, it was 
observed that replacements requiring the mutation of 
only one base were far more frequent than those requir- 
ing two consecutive mutations. It is altogether likely that, 
in circumstances where two consecutive mutations seem 
to have occurred, the amino acid sequence of the protein 
displaying the intermediate single mutation, although it 
exists, has not yet been determined. 

One approach to examining quantitatively the 
progress of natural selection has been to calculate a 
mutation probability for every pair of possible replace- 
ments.” This was accomplished by reconstructing a 
probable sequence of events in the evolution of 10 differ- 
ent groups of closely related sequences. Sequences for 
common ancestors were predicted from alignments, and 
all of the replacements that should have occurred follow- 
ing the divergence of the progeny from that ancestor 
were tabulated to provide the basis for the calculation of 
probabilities. The results of this study were presented as 
mutation probabilities. A mutation probability is the 
probability that a certain replacement will occur during 
a time long enough for a particular number of replace- 
ments to accumulate for every 100 amino acids of the 
sequence. 

Values for mutation probabilities over a period of 
time long enough for two replacements for every 100 
amino acids (Table 7-1) register changes that occur over 
the short term. Almost all of the replacements with the 
highest mutation probabilities over this period require 
only one base change to occur and are also remarkably 
conservative. For example, the 12 most frequent replace- 
ments do not involve any change in charge number or 
even polarity. Replacements involving alanine are the 
most frequent by a considerable margin, an observation 
suggesting that a truncation to the B carbon is the most 
readily tolerated change. A large number of replace- 
ments are not tolerated well at all (mutational probabil- 
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Table 7-1: Mutation Probabilities for Various Pairs of 
Amino Acid Replacements 


mutation pairs with 

pair" base probability’ mutation probability 

changes? (%) less than 0.005% 
V/I 1 2.3 WA YA RIA 
E/D 1 1.9 R/W RY RD 
S/A 1 1.6 WIN YN R/C 
S/T 1 1.5 WID YD R/E 
Y/F 1 1.3 W/V YV RIG 
S/N 1 1.2 WQ YQ RAI 
L/M 1 1.1 WE YE BIL 
K/R 1 0.9 W/G Y/G RIP 
V/M 1 0.9 WII Y/I R/T 
G/A 1 0.8 W/L YL RV 
P/A 1 0.8 W/K YM C/N 
T/A 1 0.8 W/M vip C/D 
N/D 1 0.8 C/W C/Y C/Q 
S/G 1 0.7 W/S YT C/E 
I/L 1 0.7 W/T P/Y C/H 
N/A 2 0.6 DIW E/N C/L 
E/A 1 0.6 P/H F/D C/K 
V/A 1 0.6 P/L F/Q F/C 
N/K 1 0.6 P/M F/E P/C 
Q/E 1 0.6 H/M F/G P/H 
Q/A 2 0.5 G/M PIK 
V/L 1 0.5 G/I F/P 
H/N 1 0.5 D/I 
D/S 2 0.5 


“Pairs of amino acids occupying the same position in pairs of aligned sequences. 
’Minimum number of changes required to change a codon from one member of 
the pair into a codon for the other. ‘Probability that the given pair will occur at the 
same position in two aligned sequences that have only two replacements for 
every 100 positions.” Only pairs with probabilities greater than or equal to 0.5% 
are tabulated (24 of the 190 possible pairs). 


ity <0.005%) over the short term (Table 7-1). Many of 
these require two or three consecutive base changes. The 
frequency at which other amino acids are turned into 
tryptophan or methionine or into cysteine, tyrosine, or 
phenylalanine will also be decreased by the fact that 
these amino acids have only one or two codons, respec- 
tively.° The amino acids most intolerant to promiscuous 
replacement are tryptophan, tyrosine, arginine, and cys- 
teine (mutation probabilities are less than 0.005% for 11 
or more of the 19 possible replacements), in part because 
they along with glycine and proline are so peculiar. The 
amino acids that are the most promiscuously replaced 
are alanine, serine, glutamine, threonine, valine, methio- 
nine, lysine, and asparagine (mutation probabilities are 
greater than 0.12% for 11 or more replacements). 

There are several ways in which the DNA encoding 
a protein can be altered over time. The most common is 
by point mutation, which is the ultimate source of the 
exchanges of one amino acid for another that are 
observed in the alignments of the four eukaryotic 
cytochromes c (Figure 7-1A). It is also possible for a start 
site or a stop site for translation to be mutated and 
another start site or stop site, either already present or 


arising by mutation, to take over, causing the protein to 
become longer or shorter at one or the other of its ends 
(Figure 7-1A). Because the amino-terminal and carboxy- 
terminal segments of a protein are seldom involved in its 
function, this is usually an inconsequential change. 

Eukaryotic genes contain introns. These are seg- 
ments of DNA, often quite long, inserted at several loca- 
tions within the coding sequence of the genomic DNA. 
Introns are removed by splicing at the level of the mes- 
senger RNA. Often a cell will contain two or more’? ver- 
sions of the same protein, one in which all the splices 
were successful and one or more in which one or more of 
the splices has failed. For example, there are two crys- 
tallins found in the lenses of the eyes of Zapus hudsonius, 
the longer containing an unremoved insert between 
positions 63 and 64 in the amino acid sequence of the 
shorter.” Errors in splicing can also lead to two forms of 
a protein that differ in their amino-terminal sequences 
because amino-terminal segments under the control of 
two different promoters, respectively, have been alterna- 
tively spliced to the same coding sequence for the 
remainder of the protein.” Likewise, there are two differ- 
ent versions of subunit ß of isocitrate dehydrogenase 
(NAD*) in Bos taurus that differ only in their carboxy-ter- 
minal sequences. Both are encoded by the same genomic 
DNA. One ends with a sequence of 28 aa; the other, with 
a completely different sequence of 26 aa even though 
their sequences are exactly the same for the first 357 aa. 
The former results from a messenger RNA in which the 
final exon in the genomic DNA is properly spliced to 
those that go before; the latter, from a messenger RNA in 
which this exon is skipped and the following exon is 
spliced to those that go before.” Each of these types of 
alternative splicing has the potential to produce differ- 
ent versions of the same proteins. 

One of the common changes that occurs over evo- 
lutionary time is the insertion into or the deletion from 
a protein of a short segment of amino acids. For example, 
most forms of adenylate kinase have an additional 25 aa 
in their amino acid sequence between the valine and the 
aspartate in the sequence -GRVDDN- found near posi- 
tion 140 in the amino acid sequences of isoform 1 of 
adenylate kinase from mammalian cytoplasm.” This 
additional segment of 25 aa is present in the adenylate 
kinases from bacteria, fungi, and mitochondria, so it can 
be concluded that it must have been deleted during one 
of the genetic events leading to the appearance of mam- 
malian isoform 1. Usually, however, it is difficult to tell 
whether a deletion or an insertion has occurred. Because 
such insertions or deletions appear as frequently in pro- 
teins from prokaryotes as they do in proteins from 
eukaryotes, they must arise from processes independent 
of alternative splicing. When the sequences of two pro- 
teins that differ by a deletion or an insertion are aligned, 
a gap is included in the shorter to permit the alignment 
of the sequences on the two sides of the aberration. 

A gap is a series of blank spaces inserted into one 


amino acid sequence that is missing a segment of amino 
acids present in the other amino acid sequence with 
which the first is being aligned. For example, it is neces- 
sary to insert three gaps of 6, 5, and 8 spaces in length into 
the amino acid sequence of cytochromec from 
T. alalunga and two gaps of 1 and 3 spaces in length into 
the amino acid sequence of cytochrome c-550 from 
Paracoccus denitrificans in order to achieve the most rea- 
sonable alignment of these two proteins (Figure 7-1B). 
On either side of each gap, the alignments are convinc- 
ing enough to justify the insertions of the gaps required 
to bring those alignments into register. It must be kept in 
mind that in the actual polypeptide there is no gap; rather 
it is the other polypeptide, the one with the ungapped 
sequence, that has additional amino acids at that point 
(Figure 7-1C). The use of a gap is simply a convenient 
method for displaying the alignment of the sequences. 

When two sequences are aligned, their similarity is 
usually quantified by stating their percentage of identity 
and the gap percentage. The percentage of identity is the 
percentage of the average number of positions in the two 
aligned sequences that are occupied by the same amino 
acid. The gap percentage is the number of gaps that had 
to be inserted for every 100 amino acids in the alignment. 
For example, in the alignment of the cytochromes c from 
T. alalunga and P. denitrificans (Figure 7-1B), an average 
of 110.5 positions from the two sequences are aligned, 
there are 38 identities for a percentage of identity of 34% 
identity, and there are 5 gaps for a gap percentage of 
4.5 gap percent.” 

The alignments of the cytochromes c in Figure 7-1 
are so obvious that they can be performed unassisted. 
Even for the cytochromes c from T. alalunga and P. den- 
itrificans, with only 34% identity and 4.5 gap percent, the 
sequences are easily aligned by eye. The amino acid 
sequence of each protein changes, however, at a differ- 
ent rate during evolution. Although there are proteins 
that change more slowly than cytochrome c, such as his- 
tone H4 (20 times more slowly), calmodulin (4 times 
more slowly), o tubulin (2 times more slowly), ubiquitin 
(2 times more slowly), and protein phosphatase 2A 
(2 times more slowly), most proteins change more rap- 
idly than cytochrome c. As more and more time has 
passed following the divergence of the amino acid 


* There is no agreement as to the length of amino acid sequence to 
be used in calculating the percentage of identity. The most 
common choice is the length of the shorter sequence. The justifi- 
cation for this choice is that the inserts in the longer protein cannot 
be compared to anything and should therefore be discounted. This 
choice, however, in a self-contradiction, ignores the inserts in the 
smaller protein. Probably, the best choice would be to use the 
length of the common amino acid sequence in which a gap appears 
in neither protein. The problem with this choice is that it would 
inflate both percentages and would be misleading in the absence 
of universal agreement. The choice made in the present calcula- 
tions was to use the mean length of the two sequences being com- 
pared, which produces somewhat smaller percentages than any of 
the other choices. 
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sequences of two proteins from that of their common 
ancestor, the percentage of identity decreases and the 
gap percentage increases until it becomes difficult to 
align them. Appropriately programmed digital comput- 
ers are used to align such distantly related sequences.” 

The computational alignment of two distantly 
related amino acid sequences is accomplished by con- 
structing a matrix.” If one sequence A has p amino acids, 
arranged in the order aaa ... a, and the other 
sequence B has q amino acids, arranged in the order 
bıbb; ... bg the product of these two vectors is a matrix 
C, the coefficients of which, co, are equal to a; x b;, where 
a; and b; are particular amino acids. For example, in the 
alignment of the cytochromes c from T. alalunga and 
P. denitrificans (Figure 7-1B), dg x Du would be Thr x Glu. 
The numerical value assigned to a particular position c;; 
in the matrix representing a;x b; depends on the schemes 
chosen to weight the comparisons. 

The simplest scheme is to decide that when q;= b; 
Cy = 4; x bj = 1, and when q; # bj, c;= a; x bj = 0. This pro- 
duces a matrix the coefficients of which, Cip are either 1 
or 0. When the amino acid in position a; in the first 
sequence is the same as the amino acid in position b; in 
the second sequence, cj = 1; when they are different, 
regardless of the difference, cj=0. Such a matrix, spread 
upon a two-dimensional field, can be represented dia- 
grammatically by placing a dot on every position with a 
score of 1 (Figure 7-2).” In such a dot matrix, the align- 
mentis represented by diagonal strings of dots. In the dot 
matrices comparing the amino acid sequence of the 
cytochrome c of human with those of monkey and fish in 
Figure 7-2, the diagonals are obvious and unbroken. In 
the matrix comparing those of human and bacterium, 
the alignment is a set of at least three diagonal segments 
that can be picked out by eye if the figure is tilted and 
viewed along the diagonal direction. The offsets between 
the diagonal segments are the gaps in the alignments. 

There are, however, 231 different outcomes* for 
a;x b; if one assumes symmetry, namely, that Glu x Thr = 
Thr x Glu, if one treats cysteine and cystine as separate 
amino acids, and if one treats each of the 21 types of 
identity as a unique result. It has always seemed that 
some of these 231 outcomes are more probable and that 
recognition of this probability with the proper weighting 
scheme might enhance the ability to align distantly 
related sequences. Each of the more than 18 available 
weighting schemes” is a table of numbers assigned to 
the 231 possible identities and replacements. Each of 
these entries reflects the author’s view of the probability 
that such an outcome is the result of evolution by natu- 
ral selection. 

The ultimate goal in aligning two amino acid 
sequences is to decide whether position a; in sequence A 
and position b; in sequence B arose from the same posi- 
tion in the sequence of a common ancestor or position a; 


* [(21 x 20)/2] + 21. 
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Figure 7-2: Dot matrices” for the amino acid sequences of the cytochromes c from Macaca mulatta, T. alalunga, and Rhodospirillum 
rubrum, each compared to the amino acid sequence of human cytochrome c. The sequence of human cytochrome c is the vertical vector (top 
to bottom, amino to carboxy terminus), and the respective sequences with which it is compared are the horizontal vectors (left to right, amino 
to carboxy terminus). A dot is placed in the matrix when the amino acids at those two positions, horizontal and vertical, are the same. 


Reprinted with permission from ref 27. Copyright 1970 Springer-Verlag. 


in sequence A and position b; in sequence B are unre- 
lated to each other either because protein A and pro- 
tein B do not share a sufficiently recent common ancestor 
that they can be aligned or because the two sequences 
are misaligned. If the two amino acid sequences are 
unrelated, a; x b; is governed solely by chance. If the 
respective positions are descended from the same posi- 
tion in an ancestral sequence, then a; x b; should retain 
some of the biases enforced by natural selection, and 
these biases, if they can be quantified, should be consid- 
ered while a decision is reached. If a particular replace- 
ment has a higher probability of occurring as a result of 
evolutionary change than it does of occurring as a result 
of random change, then whenever that particular 
replacement is encountered, those two positions have a 
higher probability of being evolutionarily related than of 
being unrelated. For example, every a; x b; where a; and 
b; are interconvertible by only one base change should 
have a higher probability of being evolutionarily related 
than those a; x D where a; and b; are interconvertible only 
by two or three base changes.” Every a; x b; where a; and 
b; are similar in size or chemical properties should have 
a higher probability of being evolutionarily related than 
those in which they are dissimilar.” The mutation prob- 
ability (Table 7-1)? can also be used to weight a; x b;. 
The net effect of any one of the different weighting 
schemes, or some combination of them, is to assign a 
number to every coefficient c; of the matrix. The magni- 
tude of this number is thought to quantify either the 
effect of natural selection relative to chance on the par- 


ticular replacement a; x b; or the probability that the 
particular replacement a; x b; would arise as the result of 
evolution rather than chance. Consequently, logarithms 
of these probabilities are used as entries in the matrix so 
that the summation to be performed will represent prod- 
ucts of probabilities.? It is also possible to incorporate 
weights into a dot matrix by assigning a dot to any c; the 
weight of which exceeds a certain threshold.” 

When the matrix has been constructed to the taste 
of the practitioner, the alignment can be performed.” 
Any alignment of the two sequences A and B can be rep- 
resented as a set of consecutive diagonal segments run- 
ning through the matrix, for example, the three diagonal 
segments in the dot matrix comparing the cytochromes c 
of humans and R. rubrum (Figure 7-2). To be included in 
the alignment, the end of one of these diagonal segments 
must be in line with, above, or to the left of the beginning 
of the next diagonal segment when the diagonals run 
from top left to bottom right as in Figure 7-2. Each dis- 
continuity requiring a negative vertical shift or a positive 
horizontal shift to connect the previous diagonal to the 
next diagonal represents a gap in one of the sequences 
being aligned. Associated with each individual alignment 
is an alignment score 


AS = YS cy - > Py (7-1) 
ij k 


where the respective sums are over all cu intersected by 
the diagonal segments and over all gaps k that must be 


inserted and P, is a penalty assessed for creating the 
gap k. A computer can be programmed to find the path 
of diagonal segments through the matrix that has the 
largest alignment score,” and this path produces the 
most appropriate alignment of the two sequences dic- 
tated by the choice of weighting scheme and gap penalty. 

The penalty assessed for each gap is an estimate of 
the logarithm of the probability of a gap the length of 
gap k appearing during evolution by natural selection on 
the same numerical scale used to assign the logarithms 
of the probabilities to each cj. It is possible to optimize 
such a gap penalty for the particular weighting scheme 
being used.” For example, it has been shown that when 
values of 1 are assigned to identities and values of 0 are 
assigned to nonidentities, the appropriate gap penalty is 


P= 1.2 + 0.231 (7-2) 
where lis the length of the gap. 

The most important responsibility of an investiga- 
tor who performs such a computation and produces an 
alignment with the maximum alignment score is to pro- 
vide an assessment of its statistical significance. The 
accepted criterion for this assessment is a statistical eval- 
uation of a set of alignments produced from randomly 
jumbled sequences of the same length and amino acid 
composition as the actual amino acid sequences.’ First 
the two actual sequences are aligned, and a maximum 
alignment score for the optimum alignment is calcu- 
lated. Then each of the two actual sequences is randomly 
jumbled a number of times to produce for each a set of 
nonsense sequences that have the same amino acid 
composition and length as the actual amino acid 
sequence from which they were generated. This pro- 
duces two sets of randomly jumbled amino acid 
sequences, one derived from each of the two actual 
sequences. All of the different combinations of one jum- 
bled sequence from one of these two sets and one jum- 
bled sequence from the other set are aligned by the same 
algorithm that was used to align the two actual, unjum- 
bled sequences, and a large number of maximum align- 
ment scores for the nonsense sequences is gathered in 
this way. The mean and standard deviation of the align- 
ment scores of this collection of randomly jumbled non- 
sense sequences are calculated by the usual statistical 
formulas. 

The number of standard deviations that the align- 
ment score for the two actual amino acid sequences lies 
above the mean for the maximum alignment scores for 
the jumbled sequences is a measure of the confidence 
that can be assigned to the decision that the two actual 
sequences share a common ancestor and to the decision 
that the alignment has juxtaposed positions in the 
sequence that have evolved independently from the 
same position in the ancestral sequence. For example, 
when human ß2 microglobulin was aligned with the 
K-constant region of human immunoglobulin, the maxi- 
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mum alignment score was 3.5 standard deviations larger 
than the mean of the alignments of the jumbled 
sequences (Table 7-2). Unfortunately, there is no 
accepted level of statistical significance above which an 
alignment is judged to be real. Consequently, each 
person is left to make her own decision. Two sequences 
of amino acids are considered to be homologous to each 
other once the decision has been made that their align- 
ment is statistically significant. 

As the data bases from which candidates for align- 
ment are drawn become larger and larger, the risk that 
the alignment of the amino acid sequences of two unre- 
lated proteins will nevertheless be judged to be statisti- 
cally significant becomes greater. For example, if the 
lengths of the sequences are disregarded, in a data base 
containing 100,000 amino acid sequences, each of the 
amino acid sequences should be able to be aligned with 
two or three other unrelated sequences in the data base 
with alignment scores that are at least 4 standard devia- 
tions greater than the means of the jumbles. 

There is a frequently encountered sleight of hand 
that is practiced in the alignment of amino acid 
sequences and that violates the rules of statistics. This 
trick is to align two sequences and then select only the 
regions in which there is a higher frequency of coinci- 
dence for the statistical test. Because the sample has 
been preselected, it usually shows a higher frequency of 
coincidences than occurs when jumbled sequences of 
the same small regions are compared. Ordinarily, statis- 
tical evaluation of an alignment of two amino acid 
sequences shorter than those of complete, naturally 
occurring, and logically defensible domains within the 
native protein should not be accepted without the clos- 
est scrutiny. 

At the present time, statistically significant align- 
ments can be made only between two amino acid 
sequences that have a percentage of identity of 15% or 
greater upon alignment.”®”*” If a set of three or four 
amino acid sequences can be assembled, however, that 
are from a set of proteins that share some structural or 
functional feature, it is often possible to demonstrate 
with high statistical confidence that the members of this 
set all share the same common ancestor even when pair- 
wise comparisons between the members of the set fail to 
demonstrate convincing homology.**"4 In these 
instances, the statistical significance only becomes con- 
vincing when the whole set is aligned together. Such 
methods for multiple alignment can detect with statisti- 
cal significance many more correct relationships 
between distantly related proteins than can pairwise 
alignments.” 

Although they also identify statistically signifi- 
cant’°”’ relationships among proteins, computational 
procedures that rapidly search large banks of amino 
acid sequences should be distinguished from the com- 
putational procedure for aligning two sequences by 
using a complete matrix. Banks of the currently available 
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amino acid sequences, such as the Swiss-Prot Sequence 
Database (www.expasy.ch) and the Protein Sequence 
Database of the Protein Information Resource 
(http://pir.georgetown.edu), contain the sequences of 
hundreds of thousands of proteins. When the sequence 
of a new protein becomes available, it is not possible to 
attempt a complete computational alignment between it 
and each of the proteins in such large collections. 
Consequently, other strategies have been developed to 
search such a bank and rapidly find as many candidates 
for a relationship as possible. Each of these candidates 
can then be aligned by the standard matrix method with 
the new amino acid sequence to validate statistically sig- 
nificant relationships. 

The methods that are used to search the banks take 
advantage of the fact that evolution operates unevenly 
over a sequence of amino acids. Because segments of the 
sequence in the core of the structure of a protein or in 
functionally important locations change far less rapidly 
than regions on the surface;* in distantly related amino 
acid sequences, identities and conservative replace- 
ments tend to be clustered. For example, in the amino 
acid sequences of the group of proteins containing the 
ATP-binding cassette, the sequence -SGCGKST-, or lim- 
ited variations of it in which the first serine is replaced by 
a proline or a threonine, the cysteine is replaced by a 
serine, the second serine is replaced by a threonine or a 
glycine, or the threonine is replaced by a serine or a glu- 
tamine, appears in all of the members even though the 
rest of the amino acid sequences show low percentages 
of identity.” Consequently, searching a bank for short 
segments of amino acid sequence that have a high 
degree of similarity with short segments in the new 
amino acid sequence has a significant probability of 
locating relatives and has the advantage that it can be 
done rapidly. 

The bank of amino acid sequences is searched for 
short segments of sequence that are similar to any of the 
segments of a certain length in the newly sequenced pro- 
tein. The similarity is quantified by summing the weights 
given to each identity and each replacement in the 
aligned segments by use of the weighting scheme pre- 
ferred by the investigator. If the score for the match is 
above a certain predetermined threshold, the amino acid 
sequence in the bank containing this segment is judged 
to be a candidate for a relationship. 

The three most widely used algorithms for search- 
ing banks of amino acid sequences differ only in how 
they find the segments. In the BLAST algorithm,“ every 
sequence four amino acids in length that appears in the 
complete sequence of the protein is tabulated. The 
amino acid sequences in the bank are then searched for 
segments identical or highly similar to one of the tabu- 
lated segments. When such a match is found, the align- 
ment is extended in both directions to find a longer 
segment that has a high degree of similarity to the corre- 
sponding segment of the sequence being matched. It is 


the score for this longer segment, based on the identities 
and replacements it contains, that must exceed the final 
threshold. In the FASTA algorithm,“ regions within the 
new amino acid sequence and regions within an amino 
acid sequence in the bank with the highest density of 
identities are located. Ten of these regions are then 
trimmed until the portion of each giving the highest 
score is identified. All of these regions with scores above 
a threshold are joined, and the gaps resulting from the 
joining are penalized. It is the score for this rough align- 
ment of a portion of the protein that must exceed the 
final threshold. The SSEARCH algorithm””” searches 
directly each sequence in the bank for the segment that 
has the highest score when aligned with a segment of the 
new sequence. Because each of these procedures focuses 
only on short segments of the sequences being searched, 
each misses some statistically significant matches, but 
the advantage gained is that they are rapid enough that 
such searches of the large extant banks of sequences can 
be performed in a reasonable amount of time. 

Computer-assisted searches of banks and complete 
computational alignments have permitted the relatives of 
a newly sequenced protein to be located so that it can be 
joined with a known group. The most obvious successes 
occur when the new amino acid sequence is identical to 
one that already is known, because this identification 
often demonstrates that the same protein has two differ- 
ent, unsuspected, and unconnected functions.” For 
example, it has been discovered that the protein respon- 
sible for the function of neuroleukin, autocrine motility 
factor, maturation factor, and myofibril-bound serine 
endopeptidase inhibitor is glucose-6-phosphate isom- 
erase.“ It also happens that the amino acid sequence of 
a protein of known function can be matched with known 
amino acid sequences of proteins with unknown function 
and such an alignment gives a strong indication of the 
identity of that hitherto unknown function." When a 
new genome is sequenced, one of the banks is searched 
for matches to the new amino acid sequences of each of 
the previously unidentified proteins it contains. For 
example, when the genome of Archaeoglobus fulgidus had 
been sequenced, it was found to encode 1797 proteins 
that could be matched with amino acid sequences already 
known while it encoded only 639 proteins that could not 
be matched.“ 

The aligned amino acid sequences of polypeptides 
have been used to provide information about the speci- 
ation of organisms. The sequences of the same protein 
from a set of different species, for example, the 
sequences of the cytochromes c from different eukary- 
otic species, serve as the data on which such studies are 
based. The goal of the exercise is to construct a phyloge- 
netic tree (Figure 7-3)® that displays the evolutionary 
history of the species bearing that protein. The lengths of 
the limbs in the tree are estimates of the evolutionary dis- 
tances between any two present-day species and their 
common ancestor, represented by the node at which the 
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Figure 7-3: Phylogenetic tree®® for the cytochromes c from 53 species of eukaryotes. Each of the possible 1378 pairs of sequences was aligned 
and the minimal mutational distance between each pair was tabulated. These numerical values were then adjusted statistically for mutations 
that would have not left a trace to obtain estimates of evolutionary distances. The magnitudes of these corrections are indicated in the upper 
right corner. These estimates of evolutionary distance were used to construct the phylogenetic tree. If one passes along the branches of the 
tree between any two species, the numbers on the branches that are passed through sum to give the estimated evolutionary distance. For 
example, the evolutionary distance between carp and dogfish is 67. The noted length of the branches, in evolutionary distance, and the posi- 
tions of the nodes are determined by the most parsimonious sequence of events that satisfy the requirement that the evolutionary distances 
from the alignments of the sequences of amino acids equal the distances along the branches. In this figure, the nodes connecting marsupi- 
als (kangaroo) with the eutheria (the rest of the mammals), reptiles and birds with mammals, amphibians (frog) with amniotes (reptiles, birds, 
and mammals), fish (tuna, bonito, and carp) with tetrapods (amphibians, reptiles, birds, and mammals), and cartilagenous fish (lamprey and 
dogfish) with bony fish were placed on the scale of geologic time (millions of years) by using the dates from the fossil record at which the 


respective divergence from a common ancestor took place. Adapted with permission from ref 48. Copyright 1976 Academic Press. 


branches to those two species join. The evolutionary dis- 
tance between the amino acid sequences of two proteins 
is the value of any quantity that is thought to be directly 
proportional to the time that has elapsed since those pro- 
teins shared a common ancestor. The first step in con- 
structing a phylogenetic tree is to estimate the 
evolutionary distances between each of the (n? - n)/2 
pairs of sequences in the set of n sequences. 

There are a number of problems involved in trans- 
forming the differences in the aligned sequences of two 
proteins into an evolutionary distance 277 First, the 
number of replacements observed in the two aligned 
sequences is always an underestimate of the number of 
replacements that have actually occurred since the two 
proteins diverged from a common ancestor because sev- 
eral successive unregistered replacements at the same 
site have often taken place. Second, each position in the 
two aligned sequences varies at a different rate (Figure 
7-1A), each type of amino acid has a characteristic sus- 


ceptibility to replacement (Table 7-1), and each protein 
has its own characteristic rate of change.” Third, there 
are examples of accelerated changes occurring along 
only one branch of a tree containing species that seem 
indistinguishable from each other. For example, rat 
ribonuclease seems to have accumulated replacements 
at 4 times the rate of its close relatives the ribonucleases 
from mice, muskrats, and hamsters.” Such accelerations 
in the replacement of amino acids are also observed in 
those members of a related set of proteins that happen to 
be involved in the battle between a species and one of its 
pathogens because the weapons of such a battle are 
often rapid changes in amino acid sequence either on the 
part of the pathogen to avoid the defenses or on the part 
of the attacked to reinstate defenses.°' Fourth, the size of 
the population of a given species or its generation time 
may affect the rate at which mutations become fixed. 
This may explain why the sequences of the 
cytochromes c from three closely related strains of the 
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bacterial genus Pseudomonas show as much variation in 
their amino acid sequences (78-61% identity)” as is 
shown between mammals and amphibians (82% iden- 
tity) or between mammals and insects (67% identity). 
These problems have been addressed with varying suc- 
cess by the two methods used to estimate evolutionary 
distance. One method relies on the percentage of identity 
between the two aligned sequences; the other, on the 
minimal mutational distance between them. 

It has been shown, both theoretically” and by sim- 
ulation,” that 


_ ha + D) 
dron (7-3) 


where q is the fraction of the positions in the alignment 
occupied by identical, unreplaced amino acids and D is 
the evolutionary distance. This equation corrects for the 
variations in rate of replacement both among the differ- 
ent positions in the two aligned sequences and among 
the different types of amino acids and provides an esti- 
mate of evolutionary distance from the percentage of 
identity. 

The minimal mutational distance, however, 
focuses on the changes that have occurred rather than on 
the positions that have remained unchanged. In theory, 
there should be more information in these replacements 
of one amino acid for another because they are progres- 
sive rather than static, but the corrections required to 
account for the unrecorded changes that have occurred 
over time are inaccurate enough that the advantages of 
the greater information are significantly diminished. 

To calculate its minimal mutational distance, a pair 
of aligned amino acid sequences in the set is compared 
position by position, and the minimum number of muta- 
tions that had to be fixed to accomplish each replace- 
ment is scored. Because of the redundancy of the genetic 
code, these individual minimum numbers of mutations 
are most accurately assessed if the actual codons used for 
each amino acid are known from the nucleic acid 
sequence. These individual minimum numbers of muta- 
tions are added together to obtain the minimum total 
number of mutations that had to be fixed to convert 
either of the two sequences into the other. This sum is 
the minimal mutational distance between the two pro- 
teins.” 

The actual number of mutations that were fixed in 
the two lineages diverging from the common ancestor 
represented by each of the comparisons between two 
species is almost always greater than the number calcu- 
lated, even when the nucleic acid sequences are known, 
because mutations fixed in the past but then replaced by 
mutations fixed at the same position at a later date 
cannot be scored. If the nucleic acid sequence encoding 
either protein is unknown, mutations to an alternative 
codon for the same amino acid are also missed. The min- 
imal mutational distances must be corrected statisti- 


cally**°® for all of these missing mutations to obtain esti- 
mates of evolutionary distances (Figure 7-3). 

The major contributors to the minimal mutational 
distances calculated for each pair of aligned amino acid 
sequences are the regions of the protein that have expe- 
rienced the greatest change over time. Unfortunately, 
these are also the most difficult segments of the amino 
acid sequences to align convincingly. As a result, the 
choice of the method used to align the sequences can 
have a significant effect on the structure of the final tree. 
With this in mind, a method of progressive alignment of 
amino acid sequences has been developed to provide 
the most suitable and internally consistent alignments of 
a large collection of sequences of the same protein from 
different species.” The basis of this method is the 
assumption from the beginning that allofthe amino acid 
sequences to be aligned share a common ancestor and 
have diverged from that common ancestor along their 
own unique lineages. The most closely related sequences 
are aligned first, and the gaps in these more certain align- 
ments are retained as the more distant alignments are 
made. This is advantageous because it is the uncertainty 
in the precise location of the gaps that must be inserted 
to align distantly related sequences that creates the 
greatest uncertainty in the final value for the minimal 
mutational distance. An example of the product of this 
method is the progressive alignment of the amino acid 
sequences of 11 globins (Figure 7-4).°’ The important 
feature of these alignments is that the gaps are confined 
to specific locations rather than being more randomly 
distributed as would result from simple pairwise align- 
ments. 

The tabulated values of evolutionary distances are 
used to construct a tree the branches of which connect 
the species being compared (Figure 7-3). The tree is 
arranged so that the connections made produce the 
most parsimonious sequence of events that can repro- 
duce the observed evolutionary distances. The overall 
length of the line segments connecting any two present 
day species is equal to the evolutionary distance between 
the two aligned sequences of the proteins from each of 
them. The branching order in such a tree conveys a his- 
torical sequence of the relationships among the species 
represented, and these historical sequences seem to be 
reasonable, based on the fossil record and anatomical 
resemblances. 

Usually the phylogenetic trees that are built from 
the amino acid sequences of only one protein, for exam- 
ple, those of the cytochromes c (Figure 7-3), are unsatis- 
factory. Often there are sequences of a particular protein 
available for only a limited number of species. Often the 
phylogenetic tree based on the amino acid sequences of 
one protein disagrees with the phylogenetic tree based 
on the sequences of another.” There are a number of 
solutions to this problem. For example, a more compre- 
hensive and detailed phylogenetic tree of the eukaryotes 
than the one displayed in Figure 7-3 has been built by 
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hghu GHFTEEDKATI SLW GKV NVEDAGGETLGRLLVVYPWTORFFDSFGNLSSASAIMGNPK VKAHGKKVLTSLG 
hbhu VHLTPEEKSAV TALW GKV NVDEVGGEALGRLLVVYPWTORFFESFGDLSTPDAVMGNPK VKAHGKKVLGAFS 
hahu VLSPADKTNV KAAW GKVGAHAGEYGAEALERMFLSFPTTKTYFPHF DLSH GSAQ VKGHGKKVADALT 
heha PITDHGOPPTLSEGDKKAI RESW PQIYKNFEQNSLAVLLEFLKKFPKAODSFPKFSAKKS HLEODPA VKLOAEVIINAVN 
hbrl PIVDSGSVAPLSAAEKTKI RSAW APVYSNYETSGVDILVKFFTSTPAAQEFFPKFKGMTSADOLKKSAD VRWHAERIINAVN 
myhu GLSDGEWOLV LNVW GKVEADI PGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASED LKKHGATVLTALG 
mycr SLOPASKSAL ASSWKTLAKDAATIONNGATLFSLLFKOFPDTRNYFTHFGNM SDAEMKTTGV GKAHSMAVFAGIG 
haew KKQCGVLEGLKVKSEWGRAYGSGHDREAFSQAIWRATFAQVRESRSLFKR VHGDHTSDPA FIAHAERVLGGLD 
hety TDCGILORIKVKOQWAQVYSVGESRTDFAIDVFNNFFRTNPD RSLFNR VNGDNVYSPE FKAHMVRVFAGFD 
gpfb GAFTEKQEALVNSSW EAFK GNIPOYSVVFYTSILEKAPAAKNLFSF LANGVDPTNPK LTAHAESLFGLVR 
hbvs LDQOTINIIKATV PVLK EHGVTITTTFYKNLFAKHPEVRPLFD MGROESLEQPKALAMTVLAAAONIE 
hghu DATKHLD DLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWOKMV TGVASALSSRYH 
hbhu DGLAHLD NLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYOKVV AGVANALAHKYH 
hahu NAVAHVD DMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFL ASVSTVLTSKYR 
heha HTIGLMDKEAAMKKYLKDLSTKHSTEFQVNPDMFKELSAVFVSTM GGKAAYEKLF SIIATLLRSTYDA 
hbrl DAVASMDDTEKMSMKLRDLSGKHAKSFOVDPOYFKVLAAVIADTV AAGDAGFEKLM SMICILLRSAY 
myhu GILKKKGHHE AEIKPLAQSHATKHKIPVKYLEFISECIIQVLOSKHPGDFGADAQGAMNKAL ELFRKDMASNYKE LGFQG 
mycr SMIDSMDDADCMNGLALKLSRNHIQRKIGASRFGE MROVFPNFLDEALGGGASGDVKGAWDALL AYLODNKOA QA L 
haew IAISTLDQPATLKEELDHLOVQHEGRKIPDNYFDA FKTAILHVVAAQLGERCYSNNEEIHDAIACDGFARVLPQVLERG IKGHH 
hety ILISVLDDKPVLDQALAHYAAFH LOFGTIPFKA FGOTMFOTIAEHI HGADIGAWRAC YA EQIVT G ITA 
gpfb DSAAQLRANGAVVAD AALGSIHSOKGVSNDOFLV VKEALLKTLKOAV GDKWTDOLSTALELA YDELAAAI KKAYA 
hbvs NLPAILPAVKKIAVKHCQACVAAAHYPIVGOELLGAIKEVLGDAATDDI LDAWGKAYGVIADV FIQVEADLYAOAVE 


Figure 7-4: Multiple alignment of 11 globins by the progressive method.” The amino acid sequences aligned were the ypolypeptide of 
human hemoglobin (hghu), the D polypeptide of human hemoglobin (hbhu), the o polypeptide of human hemoglobin (hahu), globin II from 
Myxine glutinosa (heha), globin I from Petromyzon marinus (hbrl), human myoglobin (myhu), myoglobin from Cerithidea rhizophorarum 
(mycr), globin I from Lumbricus terrestris (haew), globin I from Tylorrhynchus heterochaetus (hety), leghemoglobin from Phaseolus vulgaris 
(gpfb), and bacterial hemoglobin from Vitreoscilla stercoraria (hbvs). Reprinted with permission from ref 57. Copyright 1987 Springer-Verlag. 


combining alignments of the amino acid sequences of 
a-tubulins, $-tubulins, actins, and elongation fac- 
tors lo" The conflicts between three phylogenetic trees 
for gnathosomes were resolved by considering the posi- 
tions in the sequences of gaps, the patterns of alternative 
splicing, and the distributions of introns in the genomic 
DNA.” Nevertheless, disagreements on the branching 
order of phylogenetic trees, especially the most ancient, 
persist.” 

Because the rate of replacement varies dramatically 
among the positions in the sequence of a protein (Figure 
7-1A), minimal mutational distance changes more rap- 
idly than percentage of identity over the short term, and 
corrections of minimal mutational distance are also less 
significant over the short term (Figure 7-3). Conse- 
quently, historical sequences based on minimal muta- 
tional distance are preferred for examinations of recent 
speciation. For example, a detailed phylogenetic tree for 
the order of artiodactyls covering the last 50 million years 
has been constructed from considerations of minimal 
mutational distances for aligned fibrinopeptides®' and 
pancreatic ribonucleases. In fact, ribonucleases with 
the amino acid sequences predicted for common ances- 
tors at the nodes on that tree were produced by site- 
directed mutation and shown to display the functional 
traits characteristic of artiodactyl ribonucleases. 

The phylogenetic tree, however, in addition to the 
historical sequence of events, conveys estimates of the 
evolutionary distances from existing species to common 
ancestors. These evolutionary distances can be cali- 
brated (Figures 7-3 and 7-5)” with estimates from the 
fossil record of the time at which divergence occurred. 
Eutheria and marsupials diverged from a common 


ancestor 130 million years ago (mya); mammals and 
either reptiles or birds, 300 mya; amniotes and amphib- 
ians, 365 mya; and tetrapods and fish, 405 mya.“ The 
respective nodes on the tree should fall at these dates 
(Figure 7-3). Once the distances are calibrated, the times 
at which divergences unavailable in the fossil record 
have occurred can be estimated by extrapolation. To 
overcome the problems of the different rates of change 
from one protein to the other and rapid rate of change of 
a particular protein within a particular branch of the tree, 
these calibrations are usually performed with sets of 
amino acid sequences for as many proteins as possible 
(Figure 7-5).”° In this way, the final factor converting evo- 
lutionary distance to time should be as reliable as possi- 
ble to permit a realistic extrapolation beyond the fossil 
record to be performed. 

Because the corrections required to convert mini- 
mal mutational distances to estimates of evolutionary 
distance become morte significant and less reliable over 
the long term (Figure 7-3), estimates of evolutionary dis- 
tance by percentage of identity (Equation 7-3) are pre- 
ferred for assigning a date to distant common ancestors. 
For example, it has been estimated from percentage of 
identity that eukaryotes and archaebacteria diverged 
from a common ancestor 2.3 billion years ago.” 
Estimates, however, from both percentage of identity" 
and minimal mutational distance” agree that deuter- 
stomes and protostomes diverged from a common 
ancestor 0.7 billion years ago. 

At the point at which the lineages of two presently 
existing species diverged from their common ancestor, 
the gene for a particular protein carried by the common 
ancestor became two separate and disconnected genes, 
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Figure 7-5: Calibrating evolutionary distance with dates from the 
fossil record H Sets containing amino acid sequences of the same 
protein from different species were assembled. Within each set, the 
amino acid sequences were aligned and the percentage of identity 
of all pairs of aligned sequences were converted into evolutionary 
distances with Equation 7-3. Within each set, the evolutionary dis- 
tances between pairs of sequences in which the two members were 
from two different groups were sorted into five categories based on 
which two groups they respectively represented (eutheria and mar- 
supial, mammal and either bird or reptile, amniote and amphibian, 
tetrapod and fish, and gnathostome and lamprey). The evolution- 
ary distances for pairs of sequences that diverged at these nodes 
were averaged over each set and then over all of the sets, and the 
means and standard deviations of these averages are plotted 
against the date (million years ago, mya) at which the groups in 
each set diverged from a common ancestor, as determined by the 
fossil record: 130, 300, 365, 405, and 450 mya, respectively. 
Averages are also plotted for comparisons of major mammalian 
groups that diverged from a common ancestor 100 mya. The sets of 
amino acid sequences did not each have representatives of all 
groups being compared. The number of sets of amino acid 
sequences of the same protein from different species that could be 
used were 48 for mammal/mammal, 3 for eutheria/marsupial, 16 
for mammal/bird or reptile, 11 for amniote/amphibian, 15 for 
tetrapod/fish, and only 1 for gnathostome/lamprey. 


one carried by each of the new, independent ancestral 
species. At that time, natural selection began to operate 
on these two genes independently, and the differences 
now observed in the sequences of the same protein 
from the two existing species began to accumulate. A 
similar disconnection of two genes for the same protein 
can occur within a single genome by gene duplication.° 
Gene duplication is the result of a mistake in recombi- 
nation causing the DNA in an individual suddenly to 
contain two copies of the same gene where before there 
was only one. If this duplication spreads through the 
population by genetic drift and becomes fixed,* the 


* Gene duplication is a fairly common event, its spread, however, 
over the entire population of a species is a rare event. 


genome of the affected species will now contain two 
copies of the same gene. Both copies will usually con- 
tinue to produce their respective proteins; but, because 
of the disconnection, the amino acid sequences of these 
two proteins have become capable of independent vari- 
ation to produce isoforms of a given protein or isoen- 
zymes of a given enzyme. Isoforms of the same 
polypeptide are polypeptides found in the same organ- 
isms that are encoded by different genes and have dif- 
ferent amino acid sequences but nevertheless share a 
common ancestor and, when properly folded and 
assembled, perform the same function. 

Two proteins are homologues of each other if they 
both descended from a common ancestor, usually as a 
result of the speciation of organisms but often as a result 
of gene duplication. Two proteins are orthologues of 
each other if they both descended in direct lineage from 
a common ancestor and neither lineage contains a point 
of gene duplication. Two proteins are paralogues of each 
other if the gene encoding one of them descended in 
direct lineage from one member of a pair of duplicated 
genes in the genome of acommon ancestor and the gene 
encoding the other descended in direct lineage from the 
other member of the same pair of duplicated genes. 
Isoforms are always paralogues of each other. The 
Aisoform of L-lactate dehydrogenase from dogfish 
muscle and the Aisoform of L-lactate dehydrogenase 
from pig muscle are orthologous to each other, but the 
B isoform of L-lactate dehydrogenase from pig heart and 
the Aisoform of L-lactate dehydrogenase from pig 
muscle are paralogous to each other. 

The advantage of having separate isoforms to both 
the ancestral species and those that diverged from it was 
that these isoforms could gradually specialize to meet 
separate demands. These demands are often expressed 
at the level of individual tissues within the organism, and 
the sequences of the two isoenzymes or isoforms diverge, 
in part, in response to the particular demands of sets of 
tissues. For example, one set of tissues may require that 
an enzyme respond to changes in levels of its substrates 
in a different range from the range in which it should 
respond to them in another set of tissues, and the respec- 
tive isoenzymes can be tailored to the separate sets of 
requirements. Positive selection causes advantageous 
mutations that produce changes in biological properties 
causing the isoform of a protein to be more suitable to a 
particular set of tissues to be preferentially fixed in the 
populations at the expense of the parental types, and it is 
in the adaptation of isoforms of the same protein that 
positive selection of amino acid sequences is perhaps 
most readily detected. 

It is alignments of the amino acid sequences of the 
isoforms of a given protein in one species with those of 
the isoforms of the same protein in another species that 
permit them to be identified and classified™™ and the 
history of their divergence to be described. For exam- 
ple, among the tissues of mammals and birds, at least 


three isoenzymes of L-lactate dehydrogenase have been 
identified. From this observation, it may be inferred that 
an individual mammal or bird contains within its 
genome three discrete genes encoding three discrete 
L-lactate dehydrogenases. Complete sequences are avail- 
able for many of these proteins.” A representative set of 
these amino acid sequences have been aligned, and a 
phylogenetic tree of minimal mutational distances has 
been constructed from these alignments (Figure 7-6).° 
The tree suggests that the three isoforms of L-lactate 
dehydrogenase diverged from their common ancestor 
before the appearance of the vertebrates. This conclu- 
sion has been supported in a more extensive phyloge- 
netic tree constructed from an even larger collection of 
amino acid sequences of this protein. Because each of 
these isoenzymes, or appropriate mixtures of them, are 
found in different tissues, it can be concluded that the 
natural selection which has produced them in their pres- 
ent guise has operated at the level of the tissue rather 
than that of the whole organism. To the extent that 
different tissues are constructed from different isoforms 
of the same proteins, these tissues can be considered to 
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Figure 7-6: Phylogenetic tree of seven isoenzymes of vertebrate 
L-lactate dehydrogenase.” The phylogenetic relationship among 
the seven proteins, namely, isoform A from the muscle of dogfish, 
isoform A from the muscle of chicken, isoform A from the muscle 
of pig, isoform B from the heart of chicken, isoform B from the 
heart of pig, isoform C from the testis of mouse, and isoform C from 
the testis of rat, is represented by the most parsimonious tree. The 
number on each leg is the minimal mutational distance required to 
account for the descent from the common ancestor, and the 
number in italic type at each node is the average of the minimal 
mutational distances to its descendents. The minimal mutational 
distance in any one interval is not an integer because of averaging 
over all equally most parsimonious solutions for the topological 
arrangement in which it is a participant. The total number of 
nucleotide substitutions in the entire set is 366. The count does not 
include insertions or deletions. The root is arbitrarily placed 
halfway between the two most distantly related groups. Reprinted 
with permission from ref 67. Copyright 1983 Journal of Biological 
Chemistry. 
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have evolutionary histories that are independent of the 
histories of the organisms carrying them. 

Just as was the case in the phylogenetic tree of the 
L-lactate dehydrogenases (Figure 7-6), when the three 
paralogous folded polypeptides that together form each 
molecule of mammalian fibrinogen were aligned and a 
phylogenetic tree was derived, it was found that they also 
diverged from their common ancestor well before the 
appearance of vertebrates, but Vertebrata is the only sub- 
phylum in which fibrinogen is found. These observations 
prompted a search for proteins in invertebrates that 
share a common ancestor with fibrinogen, and one such 
protein of unknown function was discovered in an echi- 
noderm.® This result suggests that a protein responsible 
for one function can evolve from a protein responsible 
for another function. Another variation on this theme of 
the transformation of the function of one of the par- 
alogues of the same protein has occurred during the evo- 
lution of the isoenzymes of malate dehydrogenase in 
Trichomonas vaginalis. Alignments of amino acid 
sequences demonstrate that two of the paralogues of 
malate dehydrogenase in this species have recently 
become L-lactate dehydrogenases.” This shift in func- 
tional properties also illustrates the fact that proteins 
with different functions often share a common ancestor. 

The genome of a particular species can encode two 
or more paralogues of a given protein if the appropriate 
gene duplications have occurred. It is the potential to 
evolve into a new protein that distinguishes such a set of 
paralogues from a single, unduplicated orthologue. As 
long as there is only one gene for a given protein in the 
genome, the protein that it encodes is required to per- 
form its designated function. That protein evolves along 
its lineage as its gene diverges into separate species, but 
it must remain the same protein. If there are two or more 
paralogues in each individual of a species, one of those 
paralogues has the opportunity to become a new protein 
with a new function. Even though paralogues usually 
retain the same function and specialize to handle differ- 
ent situations, as in the case of the isoenzymes of L-lac- 
tate dehydrogenase, often one of them changes 
sufficiently to perform another function as in the case of 
chymotrypsinogen and haptoglobin (Table 7-2). 

One way to demonstrate that a paralogue, freed 
from the necessity of performing its function by the exis- 
tence ofa sibling, is able to change sufficiently to perform 
an entirely new function is by alignment of amino acid 
sequences. In addition to alignments of the same protein 
from different species and alignments of different iso- 
forms of the same protein, it has been possible to per- 
form statistically significant alignments of the amino 
acid sequences of different proteins (Table 7-2). Many 
of these connections make sense from a functional 
standpoint. For example, parvalbumin can be success- 
fully aligned with troponin c (Table 7-2); hemoglobin, 
with myoglobin (Table 7-2); the coiled coil of human 
vimentin, with the coiled coil of human lamin A (36% 
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Table 7-2: Examples of Pairs of Proteins Thought to Share a Common Ancestor on the Basis of an Alignment of Their 


Amino Acid Sequences“ 


protein I? protein II’ % identity“ gaps? gap percent standard deviations’ 
chymotrypsinogen a, chymotrypsinogen b, 79.2 0 0 65.8 
bovine (245) bovine (245) 
hemoglobin ß, hemoglobin y, 
human (146) human (146) 73.3 0 0 40.7 
carbonic anhydrase b, carbonic anhydrase c, 60.6 1 0.4 56.8 
human (260) human (259) 
chymotrypsinogen a, trypsinogen, 46.1 6 2.6 22.6 
bovine (245) bovine (229) 
lysozyme (egg white), lactalbumin (milk), 38.2 3 2.4 10.7 
chicken (129) human (123) 
viral coat protein, viral coat protein, 40.5 1 2.3 5.0 
PFI (46) Xf(44) 
hemoglobin o, myoglobin, 27.0 1 0.7 9.3 
human (141) human (153) 
ovalbumin, antithrombin III, 28.1 6 1.6 14.3 
chicken (386) human (423) 
parvalbumin, troponin c, 27.0 2 2.0 6.1 
carp (108) bovine (161) 
cytochrome c, cytochrome f, 27.0 3 2.9 7.2 
pig (104) Spirulina maxima (89) 
ß,-microglobulin, immunoglobulin 18.8 1 1.0 3.5 
human (100) K-constant region, 

human (102) 
plastocyanin, azurin, 27.3 4 4.0 2.9 
spinach (99) Pseudomonas (128) 
histocompatibility immunoglobuiin SH 16.7 3 1.7 4.1 
antigen, mouse (173) A chain! (183) 
leghemoglobin, invertebrate hemoglobin, 22.0 3 2.0 2.6 
yellow lupin (153) midge (151) 
chymotrypsinogen b, haptoglobin b, 19.4 4 1.6 5.4 


bovine (245) 


human (245) 


“Alignment was performed on a matrix where a; x b;= 1 when a;= b; and a; x b;= 0 when a;+ b; and the gap penalty was 2.5. When a;= b;= cysteine, at a; x b;= 2.0. Reproduced 


from Doolittle.” ? 


Two proteins the amino acid sequences of which were aligned. Number of amino acids in each protein is shown in parentheses. ‘Percentage of the posi- 


tions in the aligned sequences at which the same amino acid was found in both sequences, based on the length of the shortest. “Total number of gaps that had to be intro- 
duced to get the best alignment. “Distance in standard deviations that the alignment score for the actual sequences was above the mean of the alignment scores of 36 


comparisons of jumbled sequences. Represents only a portion of the entire sequence. 


identity, 1.2 gap percent); vacuolar H*-transporting 
two-sector ATPase from Daucus carota, with the B sub- 
unit of H*-transporting two-sector ATPase from Spinacia 
oleracea (15 standard deviations above the mean of the 
jumbles);” dihydrolipoyllysine acetyltransferase from 
Escherichia coli, with dihydrolipoyllysine succinyltrans- 
ferase of E. coli (30% identity, 1.7 gap percent);” tripep- 
tidyl-peptidase II from H. sapiens, with subtilisin from 
Bacillus subtilis (34% identity, 5.3 gap percent);” aceto- 
lactate synthase III from E. coli, with tartronate-semi- 
aldehyde synthase from E. coli (34% identity, 0.7 gap 
percent); and 4-hydroxy-2-oxoglutarate aldolase from 
E.coli, with 2-dehydro-3-deoxy-phosphogluconate 
aldolase from E. coli (45% identity with no gaps).” 


The identification of a set of paralogues in the same 
species by alignment of their sequences often provides 
clues as to the functions of those members of the set that 
have not yet been studied.” For example, the fact that 
three proteins of unknown function in E. coli were par- 
alogues of methylmalonyl-CoA mutase and acetate CoA- 
transferase allowed them to be identified as a 
methylmalonyl-CoA decarboxylase, a succinate-propi- 
onate CoA-transferase, and another methylmalonyl-CoA 
mutase.” 

The more interesting though rarer connections, 
however, are those between functionally unrelated pro- 
teins. For example, ovalbumin can be successfully 
aligned with antithrombin III (Table 7-2); bovine angio- 


genin, with bovine ribonuclease (33% identity, 3.2 gap 
percent); chicken 62 crystallin, with human argini- 
nosuccinate lyase (69% identity, 0.2 gap percent); and 
glucarate dehydratase from Pseudomonas putida, with 
mandelate racemase from P. putida (23% identity, 5.6 
gap percent) ae 

All of these alignments demonstrate that the evolu- 
tion of duplicated proteins is completely analogous to 
the evolution of species. The fixation of the two forms of 
a duplicated gene within a population produces two par- 
alogues of the ancestral protein. As the paralogues of the 
protein evolve independently, they drift slowly from iso- 
forms with the same function, to proteins with similar 
but different functions, just as daughter species drift 
apart from each other, creating separate genuses. 
Occasionally, a dramatic leap occurs, for example, the 
one turning argininosuccinate lyase into 62 crystallin, a 
change resembling on a small scale the appearance of 
chordates. Usually, however, the process is one of slow, 
continuous divergence. The alignment of amino acid 
sequences gives only hints of the evolving pedigrees. A 
more complete picture is seen only when the crystallo- 
graphic molecular models of proteins are superposed. 
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Problem 7-1: Calculate alignment scores (Equation 7-1) 
for the five cytochromes c aligned in Figure 7-9 based on 
the rules that when a; = b; a; x bj = 1; when a; + b; 
a; x bj =0; and P= 1.2 + 0.23 I. 


Problem 7-2: This exercise will illustrate the method 
for assessing the validity of a particular alignment of 
two sequences. Pick a number between 1 and 80 at 
random and write it on a piece of paper. Turn to Figure 
7-1 and the alignment of the two amino acid sequences 
of the cytochromes c from T. alalunga and P. denitrifi- 
cans, respectively. Start at the amino acid in the 
sequence of the cytochrome c from T. alalunga corre- 
sponding to the number you picked at random, and 
write the next 20 amino acids in that sequence across 
the page. Below this sequence write the corresponding 
amino acids of the aligned sequence from the 
cytochromec of P. denitrificans, as in the figure. 
Calculate an alignment score by the rules in Problem 
7-1 for these two segments of aligned sequences. Take 
20 playing cards and to each of them assign one of the 
20 amino acids in the amino acid sequence from the 
cytochrome c of the amino acid sequence with the least 
number of gaps. Shuffle the cards well and deal them 
into a row. Copy out the jumbled sequence dictated by 
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this shuffle on a piece of paper. Align this jumbled 
sequence with the segment of real amino acid 
sequence from the other cytochrome c by shifting and 
gapping until you think the alignment will give the 
highest alignment score. Record that score. Repeat the 
process five times. How do the alignment scores of the 
jumbled sequences compare to the alignment score of 
the one unjumbled sequence? 


Problem 7-3: On the basis of their locations in the crys- 
tallographic molecular models of proteins, their struc- 
tural roles, and their chemical properties, the amino 
acids can be divided into three categories: hydrophobic, 
neutral, and hydrophilic. The hydrophobic amino acids 
are isoleucine (I), valine (V), leucine (L), phenylalanine 
(F), cystine (C-C), methionine (M), and alanine (A). The 
neutral amino acids are glycine (G), cysteine (C), threo- 
nine (T), tryptophan (W), serine (S), tyrosine (Y), and 
proline (P). The hydrophilic amino acids are histidine 
(H), glutamate (E), glutamine (Q), aspartate (D), 
asparagine (N), lysine (K), and arginine (R). 


The following alignment is from Figure 7-1B: 


TFVOKCAOCHTV------ ENGG 
EF-NKCKACHMIQAPDGTDIIK 


Because cytochromec is a cytoplasmic protein, 
none of the cysteines participates in a cystine. 


(A) Construct a 16 x 21 matrix on a sheet of graph 
paper for the two segments of sequence involved 
in this alignment using the following rules: 


(1) a;x bj = 1 for an identity 

(2) a;x b;=0.6 for hydrophobic x hydrophobic 
(3) a;x b;=0.6 for apathetic x apathetic 

(4) a;x b;=0.6 for hydrophilic x hydrophilic 
(5) a; x b;=0.2 for hydrophobic x apathetic 

(6) a;x b;=0.2 for apathetic x hydrophilic 

(7) a;x b;=0.0 for hydrophobic x hydrophilic 


(B) ‘Trace the alignment presented above through the 
matrix. 


(C) Calculate an alignment score for that trajectory 
with the gap penalty of Equation 7-2. 


(D) What is the most serious difficulty with the rules? 


Problem 7-4: From the genetic code, calculate the mini- 
mum number of base changes between the amino acid 
sequences of the yand p polypeptides of human hemo- 
globin as they are aligned in Figure 7-4. To do this, make 
a list containing all of the replacements between the two 
sequences, find the minimum number of base changes 
required for each, and add up the individual minimum 
base changes to obtain the total. 


Problem 7-5: The sequences of the fibrinopeptides A 
and B from a series of primates are given in the table 
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below.®! Construct a tree of minimal mutational dis- 
tances. Treat a gap as if it were two base changes. 


primate fibrinopeptide A fibrinopeptide B 

green monkey ADTGEGDFLAEGGGVR PCA“-GVNGNEEGLFGGR 
human ADSGEGDFLAEGGGVR PCA-GVNDNEEGFFSAR 
drill ADTGDGDFITEGGGVR PCA-GVNGNEEGLFGGR 
macaque ADTGEGDFLAEGGGVR NEESLFSGR 
chimpanzee ADSGEGDFLAEGGGVR PCA-GVNDNEEGFFSAR 


“Pyrrolidone-5-carboxylic acid, a cyclized form of glutamine. 


Molecular Phylogeny from Tertiary Structure 


Just as the amino acid sequences either of two different 
proteins or of the same protein from two different 
species can be aligned, so can their tertiary structures be 
superposed.” The crystallographic molecular models of 
two proteins that have tertiary structures so similar to 
each other that they are thought to share a common 
ancestor are chosen for comparison. Those pairs of 
respective o carbon atoms that unambiguously occupy 
equivalent positions in equivalent strands of secondary 
structure in the two crystallographic models are identi- 
fied statistically” and designated as forming the cores of 
the structures to be superposed. To superpose these two 
crystallographic molecular models is to translate and 
rotate one of them relative to the other until the sum of 
the squares of the distances between these pairs of 
equivalenced o carbon atoms in the cores of the two 
structures is minimized. Because two different crystallo- 
graphic molecular models are being superposed, the 
structures never coincide exactly, even if they are of the 
same protein. The two proteins the crystallographic 
molecular models of which are being compared are con- 
sidered to be homologous once a decision has been 
made by the investigator that the superposition of the 
two crystallographic molecular models is significant 
enough to demonstrate that they share a common 
ancestor. 

An example of such a superposition is that between 
porcine pancreatic elastase and tryptase from rat mast 
cells (Figure 7-7A).®®* These proteins are both serine 
endopeptidases sharing a common enzymatic mecha- 
nism, their amino acid sequences are readily aligned 
(33% identity, 2.5 gap percent), and the superposition 
confirms the fact that they share a common ancestor. A 
more distant relationship was validated by the superpo- 
sition of a portion (amino acids 157-403) of creatinase 
from P. putida and methionyl aminopeptidase from 
E. coli (Figure 7-7B).® Although both of these enzymes 
catalyze the hydrolysis of an amide and their amino acid 
sequences can be aligned computationally, their respec- 
tive substrates are quite different from each other and 
the alignment is marginal (17% identity, 1.9 gap percent). 

The degree to which two superposed structures 


coincide is usually quantified by the root mean square 
deviation. The root mean square deviation is the square 
root of the mean of the values for the squares of the dis- 
tances between only those pairs of œ carbon atoms des- 
ignated as belonging to the cores. Consequently, both 
the root mean square deviation and the percentage of the 
total number of œ carbons that were included in the 
cores from the two crystallographic molecular models 
must be noted (Table 7-3). For example, in the superpo- 
sition of porcine pancreatic elastase and tryptase from 
rat mast cells (Figure 7-7A), 65% of the a carbons were 
included in the two cores and they aligned with a root 
mean square deviation of 0.07 nm. When the crystallo- 
graphic molecular model of 4-a-glucanotransferase from 
Thermus aquaticus was superposed in turn upon the 
crystallographic molecular models of nine amylases 
from bacteria, fungi, and mammals, 270-320 aa (41-65% 
of the entire amino acid sequences of these proteins) was 
designated as belonging to the cores, and those amino 
acids in the cores aligned with root mean square devia- 
tions between 0.29 and 0.35 nm.” 

An alternative method of quantifying the superpo- 
sition of two crystallographic molecular models is to note 
the percentage of the total number of a carbons in the 
superposition that lie less than a certain distance from 
their equivalenced partners. For example, in the super- 
position of creatinase from P. putida and methionyl 
aminopeptidase from E coli (Figure 7-7B), 86% of the 
a carbons lie less than 0.25 nm from their partners. 

As two proteins diverge steadily from a common 
ancestor, first as orthologues in different species, then, 
following gene duplication, as paralogues of the same 
protein, then as paralogues with related functions such 
as adenosine kinase and ribokinase, and finally as par- 
alogues with unrelated functions such as phosphoribo- 
sylamine-glycine ligase and glutathione synthase, their 
structures drift apart from each other by greater and 
greater deviations (Table 7-3). 

Whenever the crystallographic molecular models of 
two proteins, the amino acid sequences of which can be 
aligned computationally with statistical significance, 
have been compared, they could always be unambigu- 
ously superposed.” The root mean square deviation 
of a pair of superposed crystallographic molecular 
models can be plotted as a function of the percentage of 
identity between the respective aligned amino acid 
sequences (Figure 7-8).'°’' When the percentage of 
identity in just the segments chosen as the core reaches 
around 15%, the range in which statistically meaningful 
alignments of complete amino acid sequences can no 
longer be made, the root mean square deviation of the 
a carbons of the amino acids in the core (50% or greater 
of the total æ carbons) was only 0.2 nm, and the topolog- 
ical similarity between the structures being compared 
was still unmistakable. 

Consequently, structural superpositions are able to 
establish more distant evolutionary relationships than 
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thiamin pyridinylase Bacillus thiaminolyticus 
p-maltodextrin binding protein E. coli” 
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0.38 


-glycine ligase E. coli 


phosphoribosylamine 
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glutathione synthase E. coli 


57 
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biotin carboxylase E coli” 


“The two crystallographic molecular models were superposed by use of the respective coordinates of 


Root mean square deviation between atoms in 


b 


equivalent o carbons in the cores of the two structures. 


the respective cores that were considered to be equivalent during the superposition. ‘Percent of the total 


number of o carbons in the two crystallographic molecular models that were designated as being in the 


core and that were aligned in pairs during the superposition. 
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those established by alignment of amino acid sequences. 
For example, by superposing crystallographic molecular 
models it could be shown that adenylyl-sulfate kinase, 
adenylate kinase, guanylate kinase, and 6-phospho- 
fructo-2-kinase are all members of a large group of 
kinases that share a common ancestor.” This ability to 
recognize distant relationships has permitted the history 
of the evolution of proteins to be traced just as the align- 
ment of amino acid sequences has permitted the history 
of the evolution of species to be traced. 

From Figure 7-8, it can be concluded that whenever 
two proteins have amino acid sequences that can be 
aligned with statistical significance, they will also have 
superposable tertiary structures. For example, the fact 
that the amino acid sequences of the three enzymes Ca”- 
transporting ATPase, Na*/K*-exchanging ATPase, and 
H*/K'-exchanging ATPase can be aligned demon- 
strates that their tertiary structures are superposable.'® 
This rule is important because far more sequences are 
available than tertiary structures. If the amino acid 
sequence of a protein the crystallographic molecular 
model of which is unavailable can be related to a protein 
for which a crystallographic molecular model is available 


0.24 


0.18 


0.12 


0.06 


Root mean square deviation 


100 80 60 40 20 0 
Percentage of identity 


Figure 7-8: Relationship between superposition of crystallo- 
graphic molecular models and the alignment of amino acid 
sequence.” Thirty-two pairs of homologous proteins were chosen 
for which a crystallographic molecular model of each member of 
the pair was available. Structural cores were defined for each 
superposed pair of molecular models by including all «carbon 
atoms in the polypeptide backbone the distance between which 
was less than 0.3 nm. The cores for the pairs that were most dis- 
tantly related included only 50% of the amino acids in the 
sequence. The root mean square distance between pairs of aligned 
æ carbon atoms of the cores were calculated and plotted against the 
frequency at which the same amino acid was found at the equiva- 
lent positions in the two cores, expressed as a percentage. By this 
procedure, the percentage of identity is considerably higher than it 
would be if the entire sequences had been aligned. Reprinted with 
permission from ref 101. Copyright 1986 IRL Press. 


through a valid alignment of the two amino acid 
sequences, reliable conclusions can be drawn about the 
unknown tertiary structure by a comparison with the 
known tertiary structure. 

Cytochromes c are present in all organisms, they 
are small proteins, and their structures have changed at 
a rate such that comparisons between them illustrate 
many of the facts that can be learned from superposition 
of tertiary structures. The eukaryotic cytochromes c are 
indistinguishable from each other in tertiary structure, TP 
if those from rice and tuna are assumed to represent the 
evolutionary extremes. Consequently, the eukaryotes 
can be represented by the crystallographic molecular 
model of the protein from the tuna (structure C in Figure 
7-9) 107108 The other four of the five o carbon drawings of 
the crystallographic molecular models in Figure 7-9 are 
those of four bacterial cytochromes c: the one from 
Chlorobium thiosulfatophilum (Figure 7-9A), the one 
from Pseudomonas aeruginosa (Figure 7-9B), the one 
from Rhodospirillum rubrum (Figure 7-9D), and the 
one from P. denitrificans (Figure 7-9E). A similar com- 
parison of the crystallographic molecular models of four 
other bacterial cytochromes c with that of a eukaryotic 
cytochrome c is also available.” 

When the drawings of the crystallographic molecu- 
lar models of the cytochromes c (Figure 7-9) are com- 
pared, the reason for the gaps in their aligned sequences 
(lower part of Figure 7-9) is immediately apparent. They 
represent loops in the longer protein that are smaller or 
are missing entirely from the shorter protein. These 
loops (darkened in the figure) can appear or disappear 
because positions in the sequence at their bases are near 
enough to each other to be connected without disrupting 
the structure in a major way. It has usually been observed 
that the insertions and deletions that are found in super- 
posed tertiary structures of proteins and that are respon- 
sible for the gaps in the aligned amino acid sequences 
occur in regions such as these that are peripheral to the 
central elements of the structure. The resulting loops 
usually extend out into the solvent, for example, the loop 
between positions 20 and 30 in the cytochromes c (Figure 
7-9), or extend across the outer surface of the protein, for 
example, the loop between positions 210 and 230 in 
methionyl aminopeptidase (Figure 7-7B). An interesting 
spreading apart of two flexible flaps, however, to accom- 
modate the large loop present in the cytochrome c from 
tuna but missing in the two smaller bacterial 
cytochromes illustrates a more disruptive outcome. 

These loops that come and go can be of various 
lengths. An insertion of one amino acid often causes 
almost no change in the structure of the protein except 
for a small bulge sufficient to accommodate two amino 
acids where there is only one in the shorter cousin;!!” the 
two amino acids to the sides of the bulge remain in the 
same positions. The extra two amino acids in elastase 
between positions 60 and 64 (Figure 7-7A) cause a bulge 
that is unmistakable but completely confined to this 
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10 20 30 40 50 
A YDAAAGKATYDAS-CAMCH- ---------- KTGMMGAPKVGDKAAWAPHI - --- --- ----------- 
B EDPEVLFKNKGCVACHAI--------- DTKMVGPAYKDVAAKFAGOA-- - --- 7 
C GDVAKGKKTFVQK-CAQCHTV------ ENGGKHKVGPNLWGLFGRKTGQAEGYSYTDANKS----- KG 
D EGDAAAGEK--VSKKCLACHTF------ DOGGANKVGPNLFGVFENTAAHKDDYAYSESYTEM--KAKG 
E ODGDAAKGEKEF--NKCKACHMIQAPDGTDIIKGGKTGPNLYGVVGRKIASEEGFKYGEGILEVAEKNPD 
60 70 80 90 100 

A ---AKGMNVMVANSIKGYK------- GTKGMMPAKGGNPKLTDAGVGNAVAYMVGOSK 

B --GAEAELAQRIKNGSOGV------- WGPIPMPPNAVS----DDEAOTLAKWVLSOK 

C IVWNNDTLMEYLENPKKYI-------- PGTKMIFAGIKKK---GERODLVAYLKSATS 

D LTWTEANLAAYVKDPKAFVLEKSGDPKAKSKMTFKLTK----DDEIENVIAYLKTLK 

E LTWTEADLIEYVTDPKPWLVKMTDDKGAKTKMTFKMGK------ NOADVVAFLAONSPDAGGDGEAA 


Figure 7-9: Tertiary structures of five cytochromes c.!”!® Ribbon diagrams with creases at each o carbon were made from the crystallo- 
graphic molecular models of (A) cytochrome c-555 from Chlorobium thiosulfatophilum, (B) cytochrome c-551 of Pseudomonas aeruginosa, 
(C) cytochrome c of tuna mitochondria, (D) cytochrome c, of R. rubrum, and (E) cytochrome c-550 of P. denitrificans. The hemes, the iron 
cations of which are each liganded by a methionine and a histidine, and a conserved phenylalanine are also drawn. The sequences of these 
five cytochromes c are structurally aligned at the bottom of the figure (identified by the respective letters). The alignment of R. rubrum is rep- 
resented by the dot matrix in Figure 7-2. Filled portions of the ribbon diagrams highlight loops representing insertions into the basic struc- 
ture. The central structure for tuna cytochrome c is numbered with the same numbers as those used in the alignments. Reprinted with 
permission from ref 107. Copyright 1982 Academic Press. 
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short segment, but the one extra amino acid in tryptase 
between positions 180 and 188 seems to be more disrup- 
tive. If the loops become long enough they can assume 
structure of their own. The extra amino acids between 
positions 203 and 266 in methionyl aminipeptidase from 
Pyrococcus furiosus, relative to methionyl aminopepti- 
dase from E. coli, form an almost spherical knob con- 
taining three «helices and some random meander 
sitting on the surface of the common structure Il The 
extra amino acids between positions 180 and 317 in 
serine-type carboxypeptidase from S. cerevisiae relative 
to the structures of other members of the group of related 
hydrolases form a compact globular structure, but a long 
loop of polypeptide protrudes from it and wanders over 
the surface of the common structure." The events that 
produced such insertions can be mimicked by inserting 
segments of synthetic DNA into the gene encoding a pro- 
tein and expressing the elongated version. When these 
inserts are placed at a location where they can loop out 
of the original structure, they cause little alteration in 
that structure.’ 

The structural alignment of the five amino acid 
sequences presented in Figure 7-9 is based on superpo- 
sitions!!+! of the five crystallographic molecular 
models. A structural alignment of two amino acid 
sequences is an alignment in which two respective 
amino acids that occupy the same positions upon super- 
position of the two respective crystallographic molecular 
models also occupy the same position in the alignment. 

A structural alignment of amino acid sequences 
often differs significantly from a computational align- 
ment of the same sequences. For example, a previously 
performed computational alignment of human basic 
fibroblast growth factor and human interleukin 18 
agreed with the subsequent structural alignment from 
positions 90 to 144 but was out of register with the struc- 
tural alignment by at least seven amino acids between 
positions 20 and 89.'"° It has been argued that when they 
are available, structural alignments of amino acid 
sequences are more reliable than computational align- 
ments. This is an interesting argument because it 
assumes that no relative movement of p strands relative 
to each other or advancement of the screw of an o helix 
has occurred during the divergence from a common 
ancestor. There is some support for this assumption." 

Because the tertiary structures of proteins change 
more slowly than their amino acid sequences, a struc- 
tural alignment based on a superposition can be per- 
formed between the amino acid sequences of two 
proteins that diverged from their common ancestor so 
long ago that a computational alignment would be 
insignificant. For example, the amino acid sequence of 
phosphopyruvate hydratase can be structurally aligned 
with that of mandelate racemase;!!® that of P1 nuclease 
from Penicillium citrinum, with that of phospholipase C 
from Bacillus cereus;''” and that of aspartate-semialde- 
hyde dehydrogenase, with that of glyceraldehyde- 


3-phosphate dehydrogenase.” In such structural align- 
ments between distantly related sequences, there are sig- 
nificantly more gaps than would have been inserted 
during a computational alignment because gaps are 
more readily recognized. For example, the structural 
alignment of coat protein VP1 from Mengo virus and coat 
protein from human rhinovirus 14 (13% identity) has a 
4.9 gap percent, and eight of the 14 gaps are only one 
amino acid in length.'*! When crystallographic molecu- 
lar models are available for many of the members of a 
large group of related proteins, structural alignments of 
the amino acid sequences of those members for which 
molecular models are available can be combined with 
multiple computational alignments of the sequences of 
the rest of the members of the group to produce a statis- 
tical template that can be used to search databases for 
additional as yet unrecognized members of the group 

It is the ability to perform structural alignments of 
amino acid sequences based on superpositions that 
allows the frequencies with which one amino acid is sub- 
stituted for another in more distantly related proteins to 
be ascertained.'” In one study, crystallographic molecu- 
lar models of 235 proteins from 65 different groups were 
chosen for superposition.” All of the proteins within 
each family were superposed, and their amino acid 
sequences were structurally aligned. For each type of 
amino acid in the alignments, the number of times that 
it shared that position with itself or with each of the other 
amino acids was tabulated. From this tabulation, the fre- 
quency of substitution for each amino acid could be cal- 
culated (Table 7-4). 

The table of the frequency of substitution in the 
more distantly related proteins (Table 7-4) complements 
the tabulation of mutation probabilities for closely 
related proteins (Table 7-1). Over the longer term, there 
seems to be a stronger preference for maintaining the 
hydropathy of a position. This enhanced preference 
probably arises from the fact that enough time has 
elapsed that significant substitutions have accumulated 
in the more intolerant sites. In addition to its hydropathy, 
the size of the amino acids has a significant effect on the 
frequencies at which specific replacements occur. Small 
amino acids are usually replaced by small amino acids; 
the large aromatic amino acids are most often replaced 
by other aromatic amino acids. 

Each amino acid displays a characteristic tolerance 
to replacement by any amino acid other than itself. The 
most peculiar amino acids, cystine, tryptophan, and 
glycine, are the most intolerant of replacement. Cystine 
is so intolerant that it is deleted more than twice as often 
as it is replaced by alanine, the most common substitu- 
tion. Proline is also deleted more often than it is replaced 
by any other amino acid. 

The observed frequencies of substitution listed in 
Table 7-4 do not reflect the full effects of the steric and 
chemical properties of each side chain on its capacity to 
replace another or its tolerance to replacement because 


Table 7-4: Frequency with Which Substitutions Occur in Distantly Related Proteins” 


ccy” (0.9)° trp 1 (1.6) gly 4 (8.8) cys? (1.2) pro 4 (4.5) tyr 2 (3.8) leu 6 (7.6) phe 2 (3.9) val 4 (7.3) his 2 (2.2) asp 2 (6.0) 


ccy 88 trp 55 gly 53 cys 46 pro 44 tyr 43 leu 41 phe 40 val 37 his 36 asp 35 
gap 3.0 phe 8.2 ala 6.8 ala 9.6 gap 7.3 phe 8.1 val 10.9 leu 11.5 ile 11.8 asn 5.7 ser 8.3 
ala 1.4 tyr 6.6 gap 6.7 val 7.9 ser 6.8 gap 4.9 ile 8.8 tyr 8.6 leu 11.6 lys 5.5 gap 7.2 


phe 1.0 leu 6.2 ser 6.5 thr 4.7 ala 6.0 val 4.7 phe 5.6 ile 5.4 ala 7.1 gln 5.2 asn 6.9 
glu 0.9 gap? 2.9 asp 3.1 ser 4.0 gly 4.7 leu 44 met 4.4 val 5.4 thr 4.4 gap 48 glu 6.9 
— =a val 2.3 asn 2.8 gly 3.8 lys 4.5 ser 40 — — ala 3.7 ser 3.3 ser 4.8 gly 5.2 
val 0.3 ile 22 — — gap 28 — — thr 3.5 gln LA — — gap 3.2 asp 4.3 ala 5.2 
asp 0.2 ser 2.2 tyr 1.1 leu 2.7 tyr 0.8 = = arg 1.3 asp 1.0 

his 0.1 ala 2.1 his 09 — — his 0.7 met 1.2 asn 1.3 glu 0.9 arg 1.0 pro 1.6 leu 1.2 
asn 0.1 gly 2:1 phe 0.5 ccy 1.0 trp 0.3 gln 1.2 glu 1.2 gln 0.9 his 0.8 ile 1.5 phe 0.7 
trp 0.1 — — met 0.5 his 0.7 met 0.2 pro 0.9 asp 0.8 arg 0.8 cys 0.6 met 0.9 met 0.4 


tyr 0.1 gln 0.6 trp 0.4 glu 0.6 cys 0.2 cys 0.3 his 0.7 CC 0.4 trp 0.5 trp 0.6 trp 0.4 
pro 0.1 cyx! 0.2 cyx 0.3 trp 0.4 ccy 0.0 ccy 0.0 cyx 0.3 cys 0.3 ccy 0.1 cyx 0.3 cyx 0.2 


ser 6 (7.4) thr 4 (6.3) ile 3 (5.2) arg 6 (3.7) lys 2 (5.9) gln 2 (3.6) ala 4 (8.4) glu 2 (5.1) asn 2 (4.7) met 1 (1.9) 


ser 33 thr 32 ile 31 arg 31 lys 30 gln 30 ala 30 glu 29 asn 24 leu 20.9 
thr 10.2 ser 14.1 val 18.1 lys 11.0 gap 6.9 glu 7.3 ser 8.7 asp 8.6 ser 10.9 met 20.6 
ala 7.6 ala 6.5 leu 14.2 ser 7.7 arg 6.3 lys 6.9 gly 7.6 gap 6.9 asp 9.1 ile 8.7 
gap 6.8 lys 5.1 ala 48 gln 6.1 ala 6.1 ser 6.8 gap 6.4 lys 6.7 gap 7.3 val 8.4 
gly 6.3 gap 4.7 phe 4.2 ala 5.4 ser 6.1 ala 6.7 val 6.3 ala 6.5 thr 6.6 ala 5.5 
asn 4.8 val Ap thr 3.2 gap 5.2 thr 6.1 thr 6.0 thr 5.3 gln 6.5 gly 6.1 gap 4.6 


asp 4.8 asn 4.0 glu 5.2 arg 51 — — thr 5.7 ala 5.5 lys 3.5 
pro 1.2 pro 1.8 gln 48 — — tyr 1.5 

phe 1.2 his 1.0 arg 0.9 ile 13 — — tyr 1.2 met 1.1 his 1.1 phe 1.5 arg 1.2 

his 1.0 met 0.8 gln 0.8 phe 0.9 met 1.0 ile 0.9 his 0.9 phe 0.8 met 0.7 his 1.1 


met 0.5 cys 0.4 trp 0.8 trp 0.7 phe 1.0 phe 0.8 cys 0.6 met 0.8 trp 0.4 pro 0.4 
trp 0.4 trp 0.2 his 0.6 met 0.6 trp 0.5 cyx 0.3 trp 0.4 ccx 0.4 cys 0.3 ccy 0.4 
cyx 0.3 ccy 0.1 cyx 0.3 cyx 0.5 cyx 0.2 trp 0.2 ccy 0.2 trp 0.3 ccy 0.0 cys 0.3 


“Structural alignments were performed of the amino acid sequences within each of 65 groups of proteins. Each of the 65 groups contained a different set of superposable proteins. For each of the 21 amino acids, the 
frequencies with which it was paired with itself or with each of the other 20 amino acids over all of the structural alignments were calculated.'”* The number to the right of each amino acid is the frequency (in per- 
cent) with which it was paired with the amino acid at the top of the column. ’Cystine. "The number in boldface type at the head of each column is the number of codons that encode that amino acid, and the number 
in parentheses is the percentage in which it occurs in the amino acid composition of the complete data set (208,000 amino acids). “The horizontal lines divide the amino acids with the highest frequencies of replace- 
ment from those with the lowest. All of the amino acids that are not listed in a given column have frequencies between the highest of the lowest group and the lowest of the highest group. “Frequency with which a 


gap occupied the aligned position. Frequency of cysteine plus that of cystine. $Cysteine; there are two codons for cysteine and cystine combined. The decision between cysteine and cystine is made post translationally. 
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they are biased. The most important of these biases is 
that of the number of codons for each of the amino acids 
(boldface numbers next to their abbreviations at the 
head of each column in Table 7-4). It has been shown? 
that the frequency with which an amino acid occurs in 
the overall population of proteins (number in parenthe- 
ses next to the number of codons in Table 7-4) is roughly 
proportional to the number of its codons (compare the 
numbers in boldface type with those in parentheses). 
Consequently, the number of codons for an amino acid 
must significantly affect its frequency of substitution 
over the long term. For example, if the frequencies of 
substitution for leucine in Table 7-4 were corrected only 
for number of codons of each of the amino acids, con- 
servation would still have the highest frequency but by 
much less of a margin, and the preferred substitution, by 
almost a factor of 2, would become methionine, with 
valine, isoleucine, and phenylalanine tied for second 
place. This result would make sense because methionine 
has the hydrophobic side chain that is the most similar to 
that of leucine. In addition to the number of codons, 
however, other factors such as the number of base 
changes for a given substitution and the frequency with 
which the codons are used by a given species are also fac- 
tors affecting the frequency of substitution. Because 
these effects have yet to be quantitatively assessed, no 
corrections were applied to the directly observed fre- 
quencies of substitution in Table 7-4. 

Structural alignments also allow the success of 
computational alignments and the procedures for 
searching banks of amino acid sequences to be evalu- 
ated. To perform such an evaluation, a bank of amino 
acid sequences of only those proteins for which crystal- 
lographic molecular models are available is assembled. 
The amino acid sequences in the bank can then be dis- 
tributed into groups within which each member can be 
superposed on every other member. Consequently, all of 
the members of one of these groups share a common 
ancestor. How many of these established relationships 
can be detected by a particular algorithm operating only 
on the amino acid sequences? 

When computational alignments were evalu- 
ated,” the success with which they aligned two 
sequences known to share a common ancestor was 
quantified as the percentage of the positions aligned 
structurally that were also aligned statistically. It was 
found that a greater percentage of the positions was cor- 
rectly matched when weighting schemes were used that 
assigned values to the a; x b; other than just 1 and 0, but 
the improvement at best was only 1.25 times (from a51% 
rate of success to a 64% rate of success). It made no dif- 
ference whether the weighting scheme was based on fre- 
quencies of identity and replacement from entirely 
computational alignments (Table 7-1) or from entirely 
structural alignments, but weighting schemes based on 
frequencies drawn from larger numbers of alignment 
performed somewhat better than those drawn from 


smaller numbers of alignments. Weighting schemes that 
were not based on frequencies of replacement and iden- 
tity, however, did almost as well as those that were. Even 
the best weighting scheme was unable to align sequences 
(<30% correctly matched positions) when the percentage 
of identity fell below 15%. 

When the procedures used to search databanks 
were evaluated,” the success with which they found 
matches was assessed by comparing coverage to error 
rate. Coverage is the fraction of the known relatives that 
had scores above the chosen threshold. Error rate is the 
percentage of the unrelated amino acid sequences in 
the bank that had scores above the chosen threshold. 
As the threshold was raised, the error rate decreased, but 
so did the coverage. In plots of error rate against cover- 
age, the three currently used algorithms, WU-BLAST2, 
FASTA, and SSEARCH, all performed equivalently. When 
a bank containing sets of sequences that were known to 
be related by superposition but in which none of the 
members had a percentage of identity greater than 40% 
was chosen for the searches, at a threshold producing an 
error rate of 1%, the coverage was only 0.18. In other 
words, from a bank containing 100,000 amino acid 
sequences, the search would give 1000 false matches but 
miss 82% of the real relationships with percentage of 
identities less than 40%. 

There are several different levels of agreement 
within the superposition of two crystallographic molecu- 
lar models. In the core of the structure, where changes 
occur less rapidly, the superposition is usually accept- 
able (Figure 7-7). When the two proteins are functionally 
related, the segments of the polypeptide in the core that 
participate in this common function usually superpose 
the most precisely.’ For example, the segments 
between positions 51 and 56, 102 and 108, and 195 and 
200 in the superposition of tryptase and elastase (Figure 
7-7A) contain the histidine, the aspartate, and the serine, 
respectively, that participate in their common mecha- 
nism. As more and more replacements of amino acids 
accumulate, the steric effects of these changes cause the 
backbone to shift to accommodate them. For example, 
the respective replacement of a serine and a valine in rice 
cytochrome c with a threonine and an isoleucine, both 
larger side chains, in tuna cytochrome c causes a dis- 
placement further into the solvent of the polypeptide to 
the exterior of this substitution.'”° The significant shifts 
of secondary structure in the polypeptide between cre- 
atinase and methionyl aminopeptidase (Figure 7-7B) 
within the core of the common structure result from the 
accumulation of such steric effects. Flexible loops such 
as the one between positions 32 and 42 in the superposi- 
tion of tryptase and elastase (Figure 7-7A) often differ 
dramatically in their disposition, but such differences 
may reflect only the effects of crystal packing that pins 
down an otherwise fluctuating structure 

As one traces the polypeptides through the super- 
posed acarbon diagrams (Figure 7-7), the distance 


between the backbones fluctuates as one moves through 
the core, out through the loops, and back into the core. 
These fluctuations can be represented graphically in a 
plot of the distance between the paired a carbons as a 
function of their position in the amino acid seguence 17 

The globins are a group of the same proteins and 
their isoforms from different species, for which many 
sequences are available. They include myoglobins, 
hemoglobins, erythrocruorins, and leghemoglobins. The 
details of the variations that occur in the tertiary struc- 
ture of a protein as amino acids are slowly replaced at the 
toleration of natural selection have been examined by 
superposing the nine available crystallographic molecu- 
lar models of different globins'!’ and using these super- 
positions to align their amino acid sequences." Each 
globin is formed from eight o helices stacked one upon 
the others as a bundle of sticks would be in a fire. As the 
sequences of the globins have varied, the interdigitations 
of the amino acid side chains situated between the 
ahelices has adjusted to accommodate changes in their 
size, and this has caused the helices to shift as rigid 
bodies with respect to each other. These adjustments are 
necessary because the amino acid side chains between 
the ahelices are tightly packed together and many 
atomic contacts occur. As the shifting proceeds, accom- 
modating changes in size, the individual pairs of atomic 
contacts persist between two amino acids at different 
positions in the amino acid sequence but next to each 
other in the tertiary structure even though the identities 
of the amino acids themselves change. 

About 60 amino acids out of the 140 in the polypep- 
tide of a typical globin remain in equivalent locations in 
the nine superposed crystallographic molecular models 
and account for the core of the native structure. Only half 
of these are buried; the ones on the surface remain fixed 
because they are within a helices that are themselves 
rigid structures. The regions in which the greatest varia- 
tion in sequence and tertiary structure occurs are in the 
seven loops connecting the eight helices. This is due to 
their almost exclusive location at the surfaces of the 
molecular models but may also reflect the changes in the 
end to end distances between the a helices that were 
required to accommodate the slow shifts of the helices 
relative to each other as the packing among them has 
been altered by the substitutions. 

These observations suggest that the degree of con- 
servation that is displayed by a position in the sequence 
of a protein may provide an indication of its location in 
the tertiary structure. Positions showing the least toler- 
ance to replacement are often located on the interior of 
the protein and those displaying the greatest tolerance 
tend to be located on flexible surface loops, but the ten- 
dency is not overwhelming. 

A crystallographic molecular model of myoglobin 
from the sperm whale has been prepared,” and the 
structural roles of the 82 invariant amino acids among 
the 24 myoglobins that had been sequenced at the time 
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were tabulated (Table 7-5). This list represents a combi- 
nation that has been retained since the time that all of 
these myoglobins shared a common ancestor. Positions 
marked (Hb) in Table 7-5 are those invariant among 
mammalian hemoglobins and myoglobins, and they rep- 
resent amino acids that have been retained for an even 
longer period of time. Finally, the amino acids that 
appear at these 82 positions in the nine globins aligned 
by superposition have been entered into the tabulation. 
An examination of Table 7-5 reinforces several features 
of the atomic structure of molecules of protein. 

Positions in the sequence that are buried in 
hydrophobic clusters are the most conserved. Usually 
three or four members of the group isoleucine, valine, 
phenylalanine, leucine, methionine, and alanine (Table 
7-4) will substitute among themselves in this role, but 
occasionally only one or two are suitable. For example, 
only leucine is found at position 2NA and only valine or 
isoleucine at position 11E in the globins. These two pref- 
erences presumably reflect the constraints of the intri- 
cate, interlocking stereochemistry in the interior. In two 
locations, positions 1CD and 4CD, only phenylalanine is 
found among all the globins, and presumably in this 
location the flat disk of the phenyl ring is essential to 
maintain the structure. The phenylalanine at posi- 
tion 1CD is stacked upon the heme. 

There are usually a number of locations in the 
structure of a protein where difficulties resulting from 
the packing of the backbone of the polypeptide arise. At 
position 2C in the globins, a proline seems essential to 
enforce a sharp turn. When two strands of polypeptide 
are forced too closely together, these tight locations, 
such as positions 6B, 8E, 5F, and 7H in the globins, are 
occupied by glycine, proline, alanine, serine, or threo- 
nine (Table 7-5). Both serine and threonine, by forming 
hydrogen bonds to acyl oxygens (Figure 6-7A), are able 
to hug the polypeptide. Tight fits can also result from 
the juxtaposition of a large and bulky amino acid. The 
amino acid at position 16E is crowded by the trypto- 
phan at position 12A in both hemoglobin and myoglo- 
bin. 

There are several instances in which side chains cap 
one end or the other of an o helix; for example, Serine 1A, 
Threonine 4C, Serine 1E, or Tyrosine 23H. It is often 
stated (Table 7-5) that this arrangement has the effect of 
initiating the a helix. The fact that at position 4C other 
globins lack an amino acid capable of forming a hydro- 
gen bond and still contain the o helix suggests that the 
assignment of such a purpose in this case is an over- 
statement. Remarkably, four pairs of participants in ion- 
ized hydrogen bonds between side chains on the surface 
of myoglobin are invariant in the short term. When these 
particular interactions are examined, however, over all 
nine of the globins, which represent a much longer his- 
tory of evolution, all of these hydrogen bonds are found 
to be dispensable (Table 7-6). 

A deeply buried position in the sequence of a folded 
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polypeptide remains invariably hydrophobic, but a 
buried location near the surface will occasionally erupt 
toward the water. For example, at position65 in 
cytochrome c (Figure 7-1) an arginine appears at a loca- 
tion usually occupied by hydrophobic amino acids. 
Presumably the alkane portion of the arginine traverses 
the hydrophobic region and the guanidinium can push 
through the surface into the solvent. 

Often a hydrophilic location on the exterior is occu- 
pied by a hydrophobic amino acid. For example, posi- 
tion 9A in the globins (Table 7-5) is on the exterior of the 
protein and is usually occupied by hydrophilic amino 
acids, but in the myoglobins it is occupied by leucine. 
Because such a substitution has no effect on the free 
energy of folding for myoglobin compared to the other 
globins because the leucine is solvated equivalently in 
both the unfolded and folded polypeptide, such 
exchanges are common during evolution. There is, how- 
ever, a price to be paid for such an exchange because a 
hydrophobic amino acid that replaces a hydrophilic 
amino acid on the surface of a protein makes it less solu- 
ble. The helical polymers formed by human deoxyhemo- 
globin S, in which a glutamate on the surface at 
position 4A has been replaced by a valine, are an exam- 
ple of such a problem. 


The globins also provide a particularly informative 
example of the focused constraints that natural selec- 
tion places on the gradual shifts in position among seg- 
ments of secondary structure during evolution. The 
invariant feature of both the structure and the function of 
a globin is the heme (Figure 4-18). The only functions of 
a globin are to provide a fifth ligand to the iron, to make 
its heme soluble in water, and to prevent its heme from 
approaching another heme too closely. Through all of 
the alterations encountered during evolution, the amino 
acids responsible for surrounding the heme and sup- 
porting it within the protein were required by natural 
selection to maintain these roles. The record of this series 
of accommodations can be inferred from superposing 
crystallographic molecular models of present globins so 
that their hemes are made to coincide rather than their 
polypeptides. The situation is most graphically illus- 
trated when the œ subunit from equine hemoglobin is 
superposed in this way on leghemoglobin from Lupinus 
luteus (Figure 7-10)."”?° Over this long period of evolu- 
tion, the amino acids supporting the heme have shifted 
their positions relative to it by only small distances. At 
the same time, however, the ends most distant from the 
heme of the two «helices in which these functionally 
critical amino acids reside (E and G in Figure 7-10) have 


Table 7-5: Role and Location of Invariant Residues in Myoglobin’ 


structural amino 


location’ acid‘ location’ role® 


hydrogen bond to GLU 16A, which in turn is bound to LYS 20E to hold helices A and E together (Hb)”: 


van der Waals contact with heme, hydrogen bond to HIS 1C CO to initiate helix C (Hb): TAMI 


van der Waals contact with heme parallel to heme plane; in hydrophobic cluster on HIS 7E side (Hb): F 


in contact with ASP 2CD CO through a water molecule to stabilize CD-corner: DHGK 


2NA LEU I in contact with helix H: L 

1A SER E hydrogen bond to GLU 4A NH to initiate helix A: TS 

4A GLU S hydrogen bond to SER IA NH to stabilize LEU 2NA: DQE 
5A TRP S between LEU 2NA and LYS 2EF: KWRIA 

8A VAL I in contact with helix H; in bottom hydrophobic cluster: IV 
9A LEU E protruding into solvent: KTLRAE 

12A TRP S 

WF 

14A LYS E hydrogen bond to asp (glu, gln, asn) 4GH to stabilize GH-corner (Hb): KPDE 
15A VAL I in bottom hydrophobic cluster: VIF 

16A GLU E hydrogen bond to TRP 12A: G-EYAKN 

IB ASP E hydrogen bond to gly 4B NH to stabilize AB-corner: HND 
5B HIS I hydrogen bond to HIS 1GH to stabilize GH-corner: YVHSD 
6B GLY I in close contact with GLY 8E: GPT 

10B LEU I in hydrophobic cluster on HIS 7E side (Hb): LF 

11B ILE S blocking an opening between helices B and D: EGIVY 

13B LEU I in hydrophobic cluster on HIS 7E side: MLFHV 

14B PHE I in hydrophobic cluster on HIS 7E side: FL 

1C HIS S in close contact with phe (leu) 7G: FYHTDA 

2C PRO E sharp turn from helix B to helix C (Hb): P 

3C GLU E hydrogen bond to GLU 3C NH: TWEAS 

4C THR I 

5C LEU S blocking an opening formed by helix C and CD-corner: KQLEAMK 
6C GLU E in contact with LYS 8CD through a water molecule: TAERD 
1CD PHE I 

2CD ASP E hydrogen bond to LYS 5CD to stabilize CD-corner: PEDGTS 
4CD PHE I in hydrophobic cluster on HIS 7E side of heme (Hb): F 

5CD LYS E hydrogen bond to ASP 2CD to stabilize CD-corner: G-KSAL 
6CD HIS E 

7CD LEU S in hydrophobic cluster on HIS 7E side (Hb): L-G 

8CD LYS E 


in contact with ASP 2CD CO through a water molecule to stabilize CD-corner: SKTG 
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Table 7-5: Role and Location of Invariant Residues in Myoglobin‘ - continued 


structural 
location? 


5D 
1E 
2E 
4E 
6E 
7E 
8E 
11E 
12E 
14E 
15E 
16E 
18E 
19E 
20E 
1EF 
2EF 
3EF 
5EF 
7EF 
4F 
5F 
7F 
8F 
9F 
1FG 
2FG 


3FG 
1G 
5G 
6G 
8G 
9G 
12G 
1GH 
5GH 
3H 
7H 
8H 
10H 
11H 
12H 
13H 
14H 
15H 
16H 
18H 
20H 
23H 
24H 
1HC 
4HC 


amino 


acid’ 


MET 
SER 


location? 


Lob tat otabi toi rata NP HNP eee pene 


PPP rb bor baart bio torabri P D 


role? 


blocking an opening formed by CD-corner and helix D: VMLIP 

hydrogen bond to LEU 4E CO to initiate helix E: NSDT 

in contact with lys (arg) 5E through a water molecule: APE 

in hydrophobic cluster on HIS 7E side: VLF 

hydrogen bond to neighboring molecule: KRAEQ 

hydrogen bond to the sixth ligand of the heme; van der Waals contact with heme (Hb): HL 
in close contact with GLY 6B (Hb): GA 

van der Waals contact with heme; in hydrophobic cluster on HIS 7E side: VI 
in bottom hydrophobic cluster: ALGIVF 

van der Waals contact with heme: ASEFL 

in bottom hydrophobic cluster. van der Waals contact with heme vinyl group: LFVI 
in contact with TRP 12A: TSGDY 

in hydrophobic cluster on HIS 8F side: AGI 

in bottom hydrophobic cluster: VLIA 

hydrogen bond to GLU 16A to keep helices A and E stable: AGHKSI 

hydrogen bond to neighboring molecule: HKSEQ 

hydrogen bond to glu (asp) 2A to stabilize amino terminus of helix A: DKG- 

in contact with solvent: -TGV 

hydrogen bond to ASP 18H to stabilize EF-corner and helix H: MLHKIS 

in contact with solvent: NGAS 

in hydrophobic cluster on HIS 8F side; in contact with heme (Hb): LVF 

in close contact with helix H: SAGV 

van der Waals contact with heme; hydrogen bond to pro (his) 3F CO or LEU 4F CO or HIS 8F N: LSKRV 
the fifth ligand to heme iron (Hb): H 

in close contact with helix H: AKV 

no electron density (Hb): KSYR 

van der Waals contact with heme; hydrogen bond to propionic acid residue to stabilize heme and 
FG-corner: LHFG 

no electron density: RHKEV 

sharp turn from FG-corner to helix G: DPKTA 

van der Waals contact with heme; in hydrophobic cluster on HIS 8F side: FL 
hydrogen bond to ARG 16H to stabilize helices G and H: KRENP 

van der Waals contact with heme: LIFV 

hydrogen bond to LEU 5G CO: SGARK 

in bottom hydrophobic cluster: LIF 

hydrogen bond to HIS 5B to stabilize GH-corner: LFHITV 

in bottom hydrophobic cluster: FMW 

hydrogen bond to ala (val) 4H NH: APED 

in close contact with helix A: SAG 

in bottom hydrophobic cluster: LYMFW 

hydrogen bond to GLU 4A to stabilize amino-terminal end of helix A (Hb): KAI 
in bottom hydrophobic cluster: FVALT 

in bottom hydrophobic cluster: LVY 

in contact with asn 9H through a water molecule: ASERD 

protruding into solvent: SGLMDTE 

van der Waals contact with heme; in hydrophobic cluster on HIS 8F side: VFIL 
hydrogen bond to GLU 6G to stabilize helices G and H: SARF 

hydrogen bond to HIS 5EF to stabilize EF-corner and helix H: VADFM 

in contact with solvent: TARIFK 

hydrogen bond to ile (val) 4FG CO to cap helix H (Hb): YLM 

hydrogen bond to the carboxy terminus: RHKED 


“Adapted from Takano.” The amino acids listed are those that are invariant in all myoglobins, and the structural roles assigned are those in the crystallographic molecu- 
lar model (Bragg spacing > 0.2 nm) of myoglobin from Physeter catodon. "Position in the common crystallographic molecular model of the globins. Capital letters (A-H) 
indicate which o helix, from amino- to carboxy-terminal, and the numbers indicate the position in the o helix. Double letters refer to turns between the respective helices. 
The globins are all bundles of eight a helices (Figure 4-18). ‘Amino acids that are invariant over all myoglobins. “Location in the crystallographic molecular model of myo- 
globin: I, internal; E, external; S, surface crevice. “Three-letter amino acid abbreviations given in uppercase letters represent invariant residues in myoglobin; those given 


in lowercase letters are not invariant. *Amino acids appearing at each of these positions in nine superposed globins 


are noted in one-letter code. Dash indicates dele- 


tion. Amino acids noted with (Hb) are invariant in all mammalian hemoglobins. 
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Table 7-6: Evolutionary Variation of Ionized Hydrogen 
Bonds“ 


structural location? pairs of amino acids 


2CD PEPDDPGTS 
5CD —-G-GKKSAL 
5EF MLLLHKMIS 
18H VAVADLAMV 
IB HNHNDDN-N 
19G HHHHRGRHV 
2EF DDDDKDD-G 
2A PPAGEAAAE 


“Four invariant ionized hydrogen bonds that were present in an earlier refined 
crystallographic molecular model (Table 7-5)” and a later refined crystallo- 
graphic molecular model (Bragg spacing > 0.16 nm)'” of myoglobin were chosen 
for examination. Each of the four pairs is between the amino acids in boldface 
type above and below each other in the central positions of the four paired strings 
of letters. The two amino acids forming each of these four hydrogen bonds in the 
two crystallographic molecular models, aspartate and lysine, histidine and aspar- 
tate, aspartate and arginine, and lysine and glutamate, were conserved among all 
of the myoglobins. The amino acids occupying each of these eight positions in a 
structural alignment of eight other globins™ are listed to the right and left of the 
pair occupying each of the eight positions in myoglobin. Code assigned to the 
positions in the common crystallographic molecular model of the globin class 
(Table 7-5). ‘The amino acid at the respective position in each of the globins is 
aligned above or below the amino acid at the other position in the same globin. 


shifted significantly in their position, and another o helix 
that provides no amino acids in contact with the heme 
(B in Figure 7-10) has shifted even more. 

In any protein, a few amino acids that embody its 
function can be identified. Over evolution, natural selec- 
tion maintains the relative separations and orientations 
of these amino acids because if it did not, the protein 
could no longer be what it is. An extreme example of this 
fact is found in the group of related enzymes to which 
phosphopyruvate hydratase, mandelate racemase, 
galactonate dehydratase, glucarate dehydratase, 
muconate cycloisomerase, and methylaspartate ammo- 
nia-lyase belong. Although each of these enzymes has 
diverged widely from its distant common ancestor, the 
positions of the functional groups in the active sites of 
these proteins that are responsible for the abstraction of 
the proton o to the respective carboxylate, a function 
common to the mechanism of each of them, have been 
conserved.''® The more distant a location within the pro- 
tein is from such invariant points of reference, however, 
the more likely its position will drift as mutations accu- 
mulate that shift the orientations of the segments of sec- 
ondary structure within the overall molecular structure 
of the protein. 

An exception to this rule that functional groups are 
usually the most invariant features of a protein can be seen 
in a comparison of the crystallographic molecular models 
of the phospholipase A; from cobra venom and the phos- 
pholipase from bee venom."*' From aligned amino acid 
sequences and superposed crystallographic molecular 


66’ 862 
"61 


LgHb 


Figure 7-10: Arrangement of the a helices and contact side chains 
that form part of the heme pocket in the o subunit of equine hemo- 
globin (EHbo) and in leghemoglobin II from Lupinus luteus 
(LgHb).'*° The hemes in the two proteins are superposed. The three 
æ helices are designated as B, E, and G, in order of their appearance 
in the globin molecule. The positions of homologous pairs of side 
chains (sequence positions in leghemoglobin are primed) that are 
in contact with the heme are indicated by open circles joined by 
arrows. The coupling of the shifts at the E-B and B-G helix inter- 
faces keeps the side chains that form the heme pocket in the same 
relative positions. Reprinted with permission from ref 130, origi- 
nally from ref 117. Copyright 1980 Academic Press. 


models, there is no doubt that these two proteins share a 
common ancestor. The « helix in the protein from cobra 
venom that binds to the surface of a biological membrane, 
in which the reactants for the enzyme are found, is formed 
by the first 20 amino acids of the polypeptide, but these 
20 amino acids are missing from the protein from the bee. 
The missing tertiary structure necessary for adhering the 
protein to the membrane is supplied by an additional 
æ helix at the carboxy-terminal end of the polypeptide 
from the bee that takes the place of the a helix from the 
amino-terminal end of the polypeptide from the cobra. It 
was the ability to superpose the crystallographic molecu- 
lar models of these two proteins that permitted the sub- 
stitution of one segment of the polypeptide for another in 
a functional role to be demonstrated. 

The inability to superpose two crystallographic 
molecular models of two proteins can demonstrate an 
example of convergent evolution. For proteins, conver- 
gent evolution is the assumption of the same function 
by two proteins that do not share a common ancestor. 
For example, as had been predicted by aligning 
sequences,’ chorismate mutase from S. cerevisiae 
cannot be superposed on chorismate mutase from 
B. subtilis.” Consequently, it can be concluded that 
these two unrelated proteins nevertheless evolved so 


that they each were able to perform the same function. 
Other examples of such convergent evolution are the 
3-dehydroquinate dehydratases from Salmonella 
typhimurium and Mycobacterium tuberculosis and 
the Cu,Zn-superoxide dismutase of B. taurus'” and the 
Fe-superoxide dismutase of M. tuberculosis.'* A partic- 
ularly interesting example of convergent evolution that 
was elucidated by superposing crystallographic molecu- 
lar models is that of the [2Fe-2S] ferredoxin from 
Clostridium pasteurianum. This protein is completely 
unrelated to the other [2Fe-2S] ferredoxins and turns 
out to be a thioredoxin that has been converted into a 
ferredoxin.” The more common examples of conver- 
gent evolution, however, are those in which two unre- 
lated proteins catalyze similar but not identical 
functions. For example, although they share the same 
mechanism of activating molecular oxygen for insertion 
into a carbon-hydrogen bond and both use a heme to 
do so, crystallographic molecular models of nitric-oxide 
synthase and cytochrome P-450 have different, unre- 
lated structures.” 

Often the catalytic amino acids in the active sites of 
examples of convergent evolution are arranged similarly 
in the two proteins even though the overall structures are 
completely different. The serine, histidine, and aspartate 
responsible for the nucleophilic catalysis in the active 
site of subtilisin are superposable on the serine, histi- 
dine, and aspartate in the active site of chymotrypsin 
even though the two proteins themselves cannot be 
superposed, H and the functional groups in the active 
site of alanine dehydrogenase are similarly arranged to 
those in L-lactate dehydrogenase." The catalytic amino 
acids, however, around the flavin adenine dinucleotide 
in the active sites of the two flavoenzymes L-lactate dehy- 
drogenase (cytochrome) and D-amino-acid oxidase are 
arranged in patterns that are mirror images of each 
other 

As the number of crystallographic molecular models 
has increased, instances have become more common in 
which two proteins that display no similarity in amino 
acid sequence nevertheless have segments of their terti- 
ary structure that can be superposed. An example of such 
a segment of recurring structure is found in the crystal- 
lographic molecular models of L-lactate dehydrogenase,” 
alcohol dehydrogenase,’ phosphoglycerate kinase,“ 
and phosphorylase." This common segment is 
140-200 aa in length and occurs at different locations in 
the overall sequences of these proteins. It is formed from 
the amino acids in the sequence between Asparagine 21 
and Glycine 162 in isoform A of L-lactate dehydrogenase 
from Squalus acanthius, between Phenylalanine 207 and 
Serine 392 in equine phosphoglycerate kinase, between 
Serine 193 and Phenylalanine 319 in isoform E of equine 
alcohol dehydrogenase, and between Asparagine 559 and 
Arginine 713 in the isoform of glycogen phosphorylase 
from muscle of Oryctolagus cuniculus. All four structures 
can be superposed.*”’* The superposition of these 
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regions from phosphorylase and L-lactate dehydrogenase 
is presented in Figure 7-11A.%°'? 

This particular topological pattern of secondary 
structures can be identified as a doubly wound, parallel 
Bsheet. It is a sheet of six parallel 8 strands flanked on 
both sides by o helices. The basic rhythm of the recurring 
theme is ßstrand, œ helix, Bstrand, «helix, strand, 
random meander, ßstrand, «helix, ßstrand, o helix, 
p strand. The £ strands numbered from amino terminus 
to carboxy terminus occur in the order 321456 across the 
sheet (Figure 7-20). You should trace this pattern in 
Figure 7-11A. The six f strands all run parallel to each 
other to form a pleated sheet, and the helices arch above 
or below the sheet to connect the end of one D strand to 
the next. The complete and concise theme is developed 
in L-lactate dehydrogenase (Figure 7-11), and there are 
variations on this theme in the other proteins. For exam- 
ple, in phosphoglycerate kinase, there are two «helices 
after the second strand and a long additional loop after 
the third £$ strand, and in alcohol dehydrogenase the last 
æ helix is replaced by an additional antiparallel D strand. 
An interesting variation occurs after the first 8 strand in 
the structure from phosphorylase, where a large bulge has 
appeared in the first Bstrand that pushes up the loop 
between the third and fourth strands of D structure 
(Figure 7-11A). 

Flavodoxin shares this pattern but with a more sig- 
nificant variation. In this protein, the second «helix 
and the third ß strand have been deleted (Figure 7-11B). 
This deletion seems to have been very similar to those 
seen in cytochrome c (Figure 7-9) in that the loop con- 
taining the whelix and the ß strand has simply been 
pinched off from the open end of the common struc- 
ture. Nevertheless, the superposition of flavodoxin 
upon the corresponding region from L-lactate dehydro- 
genase is quite close, even though the sequences of 
these two superposed polypeptides appear to be com- 
pletely unrelated. When the sequences are aligned, even 
with the assistance of the superposition, they have 
identical amino acids in only 9% of their aligned posi- 
tions. 

The conclusion that has been drawn from these 
superpositions is that all of these regions from these very 
different proteins together share a common ancestor. As 
these structures represent only a portion of each of the 
presently existing proteins, and as the other portions of 
the proteins bear no resemblance to each other, this 
common ancestor must have been a small primordial 
protein that was combined covalently with other small 
primordial proteins by gene fusion to produce respec- 
tively these larger chimeric proteins. Gene fusion is a 
process in which genomic DNA is recombined incor- 
rectly so that segments of different genes become fused 
together rather than, as in the usual process of recombi- 
nation, allelic segments of the same gene being inter- 
changed in precise alignment. Like gene duplication, 
gene fusion occurs frequently, but only infrequently will 
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the gene that results from the fusion spread over the 
entire population of a species by genetic drift and 
become fixed in its genome. The evidence of early gene 
fusions that did spread over a primordial population can 
still be observed in the progeny, not only in their struc- 
tures but also in the distribution of methionines that are 
the fossilized remains of the initiation sites of the smaller 
ancient proteins that were fused together.'“* The reason 
that each of the primordial proteins now resides at a dif- 
ferent location in the sequences of the present proteins is 
that each was combined with different proteins in differ- 
ent orders during these gene fusions. This description of 
evolutionary history requires that at one time in the dis- 
tant past each of the segments of these polypeptides now 
folding to produce each of these superposable regions 
was not attached to the remainder of the polypeptide to 
which it is now joined. If this is the case, then each of the 
doubly wound f-pleated sheets and the other regions in 
each of these proteins to which they are now attached 
were at one time separate proteins, and a significant part 
of the evolution of proteins is a history of the joining 
together of smaller proteins to produce ever larger pro- 
teins. 


Suggested Reading 


Rossmann, M.G., Moras, D., & Olsen, K.W. (1974) Chemical and 
biological evolution of a nucleotide-binding protein, Nature 
250, 194-199. 


Problem7-6: The followingisa portion ofa multiple struc- 
tural alignment of the amino acid sequences of 10 mem- 
bers of the chymotrypsin family of serine endopeptidases 
from Asparagine 148 to Glycine 196 of chymotrypsin, the 
amino acid sequence of which is at the op H? 
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1 NANTPDRLQQASLPLLSNTNCKK - -YWGTKIKD-AMICAG-AS - - --GVSSCMGDSG 
2 gtSYPDVLKCLKAPILSDSSCKS - -AYPGQITS-NMFCAG- Y1leg--gKDSCQGDSG 
3 -gqLAQTLQQAYLPTVDYAICSsssYWGSTVKN- SMVCAG-Gdg - - -vRSGCQGDSG 
4 gQKGQPSVLQVVNLPIVERPVCKD- -STriRITD-NMFCAGykpdegkRGDACEGDSG 
5 dfEFPDEIQCVQLTLLONt fcAd- -AHpdKVTE-SMLCAG-Y1pg--gKDTCMGDSG 
6 -dptsytLREVELRIMDEkacVd--YR--yYEykFQVCVGSPT - - -tLRAAFMGDSG 
P SRS eS Ss GLRSGSVTGlnatvn--ygssgivy-gMIQTN-------- vCAQPGDSG 
8. Beer GTHSGSVTAlnatvn--ygggdvvy-gMIRTN-------- vVCAEPGDSG 
9, EE GYQCGTITAknvtan--ya--egavrgLTQGN- ------- aCMGRGDSG 
10 --------- hGAVQYsgg- ------------------ rFT-ip----rgvgGRGDSG 


The alignment is based on the separate superposi- 
tions of crystallographic molecular models of each of the 
other nine proteins upon the crystallographic molecular 
model of chymotrypsin, which was chosen as the refer- 
ence structure for the family. The similarity of the struc- 
tures to that of chymotrypsin is given by the case and the 
face of the one-letter code of the amino acids. An upper- 
case boldface character represents an o carbon that is 
within a distance of 0.15 nm of the equivalent & carbon 
in chymotrypsin. An uppercase normal character repre- 
sents a distance within 0.25 nm, a lowercase boldface 
character represents a distance within 0.35 nm, and a 


Problem 7-7: The following is the amino acid sequence 
of the pyruvate kinase from rabbit muscle from Proline 
116 to Proline 218. 


PEIRTGLIKGSGTAEVELKKGATLKITLDNAYMEKCDE 
NILWLDYKNICKVVDVGSKVYVDDGLISLOVKOKGPDF 
LVTEVENGGFLGSKKGVNLPGAAVDLP 


The following is an alignment of the amino acid 
sequences for pyruvate kinase from cat muscle, chicken 
muscle, rat liver, and yeast over the corresponding seg- 
ments. 


lowercase normal character represents a distance of 
greater than 0.35 nm. A dash represents a gap. 


(A) Ona sheet of graph paper, construct a dot matrix 
for a comparison of the sequence of protein 3 and 
the sequence of protein 6 between the positions of 
the first and the third cysteines in the amino acid 


sequence of chymotrypsin, which is protein 1. 


(B) Trace through the dot matrix the structural align- 
ment of protein 3 and protein 6. 


(C) What is the percentage of identity for the align- 


ment in this segment? 


cat 


muscle 


chicken muscle 


rat liver 
yeast 
cat muscle 


chicken muscle 


PEIRTGLIKGSGTAEVELKKGATLKITLDNAYMEKCDENVLWLD 
PEIRTGLIKGSGTAEVELKKGAALKVTLDNAFMENCDENVLWVD 
PEIRTGVLOGGPESEVEIVKGSOVLVTVDPKFOTRGDAKTVWVK 
PEIRTG--TTTNDVDYPIPPNHEMIFTTDDKYAKACDDKIMYVD 


YKNICKVVEVGSKVYVDDGLISLLVKEKG-ADFLVTEVENGGSL 
YKNLIKVIDVGSKIYVDDGLISLLVKEKG-KDFVMTEVENGGML 


rat liver YHNITRVVAVGGRIYIDDGLISLVVOKIG-PEGLVTEVEHGGIL 
yeast YKNITKVISAGRIIYVDDGVLSFOVLEVVDDKTLKVKALNAGKI 
cat muscle GSKKGVNLPGAAVDLP 
chicken muscle GSKKGVNLPGAAVDLP 
rat liver GSRKGVNLPNTEVDLP 
yeast CSHKGVNLPGTDVDLP 
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(A) Why are the amino acid sequences of the proteins 
from cat and chicken so much more similar to 
each other than are the sequences from the cat 
and the rat to each other? 


(B) Align the amino acid sequences of the proteins 
from rabbit muscle and cat muscle from Proline 
116 to Proline 218. What is the percent identity 
and how many gaps are there? 


The following figure!” is a superposition of the 


acarbons between Proline 116 and Proline 218 in the 
crystallographic molecular models of the pyruvate 
kinases from rabbit muscle and cat muscle. 


9ZLS 


92LS 


These portions of the models were superimposed 
according to the algorithm of Rossmann and Argos.'*’ 
The models of the rabbit and cat enzymes are displayed 
with filled and unfilled lines, respectively. Those amino 
acids that are labeled correspond to the protein from 
rabbit. 


(C) What is wrong with this figure? 


(D) What is the reason that something is wrong with 
this figure? 
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Ferredoxin-NADP* reductase from spinach is a protein 
composed of a folded polypeptide 314 aa long (Figure 
7-12).'849 In the native enzyme, the polypeptide 
between Asparagine 162 and Tyrosine 314 assumes a 
doubly wound, parallel p sheet of five strands (upper 
right of Figure 7-12) and the polypeptide between 
Glycine 26 and Glutamate 154 assumes an antiparallel 
B barrel (lower left of Figure 7-12). Doubly wound, paral- 
lel £ sheets recur in many different proteins and antipar- 
allel A barrels recur in many different proteins, but the 
two structures are usually not found in the same protein. 
In the crystallographic molecular model of ferre- 
doxin-NADP* reductase, the doubly wound, parallel 
Bsheet seems to be folded independently from the 
antiparallel p barrel. Only one strand of polypeptide runs 
between them. 

A large number of observations, among them the 
ones just described, have led to the conclusion that the 
native structures of most folded polypeptides can be 
divided into independent domains. A domain is any 
region within the native tertiary structure of a folded 
polypeptide for which evidence can be provided of an 
existence independent of the rest of the polypeptide. 
There are several types of independent existence that 
qualify a region within the native structure as a domain. 

The most obvious evidence that two regions of the 
same protein are domains is that either a limited cleav- 
age of the polypeptide or the expression of separate por- 
tions of the polypeptide produces independent 
fragments of the protein that retain their respective 
native structures. In such instances, the two or more sep- 
arated fragments would be the detachable domains that 
composed the intact protein. The paradigm of a protein 
with detachable domains is immunoglobulin G, a circu- 
lating antibody responsible for binding to foreign pro- 
teins or other antigens. Porter’ demonstrated that 
intact immunoglobulin G could be cleaved by the thiol 
endopeptidase papain into three pieces of almost equal 
sizes. Two of these detachable domains, the 
Fab fragments, retained the ability to bind antigens, and 
the third, the Fc fragment, was stable enough that it crys- 
tallized spontaneously during its isolation. The 
Fab fragments could be readily separated by cation- 
exchange chromatography from the Fc fragments with- 
out any loss of their biological activity. Comparisons of 
crystallographic molecular models of Fab fragments and 
Fc fragments with those of intact immunoglobulins G 
(Figure 7-13)'°"'5? have demonstrated that the detached 
and separated domains retain the respective structures 
that they had in the intact molecule before it was 
cleaved.’ 

It was unfortunate that an unintended association 
between limited cleavage with endopeptidases and 
domains was established with these elegant experiments. 
The important fact was that Porter separated the detached 


domains from each other and demonstrated that each was 
still structurally intact. It has already been noted that a 
protein must be prepared for complete digestion with 
endopeptidases by unfolding it. The reason for this is that 
most of the peptide bonds susceptible to digestion with 
a particular endopeptidase in the unfolded polypeptide 
are not susceptible to digestion in the native folded 
polypeptide. In fact, many native proteins are resistant to 
digestion over their entire length until they are 
unfolded.’ When the digestion of a native protein does 
occur, it usually occurs at only one or two locations. The 
sites at which cleavage of a native protein by endopepti- 
dases can occur are exposed, flexible loops of polypep- 
tide on its surface that are rarely situated between 
domains, and domains are often not connected by such 
loops.” If a polypeptide is cut at only one or two posi- 
tions by an endopeptidase when it is folded in its native 
conformation, this is not evidence that the fragments 
observed compose separate domains in that native con- 
formation.’ 

If, however, the protein can be digested and the 
pieces that result can be separated as biologically active 
or structurally intact moieties, they are detachable 
domains. Few examples of such endopeptidolytic 
detachments have been reported, and among those are 
the following. A protein anchored to mammalian cellular 
membranes contains within its single folded polypeptide 
the two enzymes peptidylglycine monooxygenase and 
peptidylamidoglycolate lyase. This protein can be 
digested either during normal cellular processes or by 
experimental treatment with an endopeptidase to pro- 
duce two soluble, detached domains, which can be sep- 
arated chromatographically, one of which catalyzes the 
former activity and the other of which catalyzes the 
latter.'”” The enzyme sulfite oxidase can be cleaved with 
either trypsin, chymotrypsin, or papain to produce two 
detached domains that can be separated from each other 
by molecular exclusion chromatography.’ One retains 
the ability to transfer electrons from sulfite to Fe(CN),°; 
the other retains the spectrum characteristic of the heme 
in its native environment. The transfer of electrons from 
sulfite all the way to the ultimate oxidant, cytochrome c, 
can no longer occur because the domain containing the 
heme is no longer attached to the domain at which the 
sulfite is oxidized. Anion carrier is a protein in the plasma 
membrane of erythrocytes and is responsible for anion 
transport. It can be cleaved with chymotrypsin to pro- 
duce a water-soluble domain that can be readily sepa- 
rated from the other domain, which remains in the 
membrane.'°' Between them the two detached domains 
retain the biological functions that are displayed by the 
intact protein, and each retains the structure it had in 
the intact protein. 

Once the cDNA or genomic DNA encoding a pro- 
tein has become available and it has become clear 
exactly where the boundaries between its domains are 
located, they can often be detached genetically and each 
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from 
through 


(314 aa) 
19 


backbone of the crystallographic molecular 
reductase 
included Histidine 


model (Bragg spacing = 0.17 nm) of ferre- 


doxin-NADP* 
S. oleracea.“*"8 The published molecular 


model 

Glycine 26. The doubly wound, parallel 
p sheet is in the upper right portion of the 
drawing; the Greek key, antiparallel 8 barrel, 
in the lower left. Glutamate 154 identifies the 
covalent junction between the two domains. 


Note the irregular cleft between the two 
domains; only one strand of polypeptide, 


that containing Glutamate 154, runs across 
the cleft. This drawing was produced with 


Tyrosine 314, but the drawing begins at 
MolScript.’” 


Figure 7-12: Skeletal representation of the 


ofthem expressed separately. There are many examples 
of such genetic detachments. The two domains that cat- 
alyze indole-3-glycerol-phosphate synthase and phos- 
phoribosylanthranilate isomerase within the 
bifunctional enzyme of E.coli can be genetically 
detached and expressed as separate monofunctional 
proteins.’ The two domains of proteinS from 
Myxococcus xanthus can be genetically detached and 
expressed as stable monomeric proteins'™ that show no 
tendency to associate with each other and both of which 
retain their ability to bind Ca**. The domains of the AraC 
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protein from E. coli can be genetically detached and 
expressed as separate proteins that exhibit respectively 
the abilities of the intact native protein to recognize DNA 
containing its operator and to dimerize in an arabinose- 
dependent manner.’ Genetic deletion of the carboxy- 
terminal 154 aa from cystathionine -synthase (507 aa) 
actually increases its enzymatic activity. ® 

The advantage of performing a detachment geneti- 
cally is that it can be accomplished at the precise bound- 
ary and does not require that the polypeptide between 
the two domains be as extraordinarily available for diges- 
tion by an endopeptidase as are the connecting strands 


between the domains in an immunoglobulin G (Figure 
7-13). Consequently, there are far more examples of 
domains that have been successfully detached geneti- 
cally than have been detached endopeptidolytically. 
When a domain that is buried in the structure of the 
native protein and consequently has an excess of 
hydrophobic side chains on its surface is detached genet- 
ically, it is also possible to mutate some of these 
hydrophobic amino acids to hydrophilic amino acids to 
prevent the detached domain from aggregating. 
There is a protein in E. coli that is responsible for 
the two enzymatic activities of aspartate kinase and 
homoserine dehydrogenase. It is composed of one 
folded polypeptide 820 aa in length. When this protein is 
digested with glutamyl endopeptidase from Streptomyces 
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griseus, it is cut into two large fragments and a short pep- 
tide as a result of cleavages following Glycine 296 and 
Glutamate 301.'° Because chymotrypsin digests the pro- 
tein only at Leucine 294; subtilisin, only at Alanine 297; 
and trypsin, only at Arginine 299;!°” the region contain- 
ing these five sites of cleavage must be in a loop of about 
10 amino acids on the surface of the protein. The two 
large fragments from digestion with glutamyl endopepti- 
dase can be separated from each other and from their 
parent on chromatography by molecular exclusion. The 
carboxy-terminal fragment retains the homoserine 
dehydrogenase activity. The amino-terminal fragment, 
however, shows only 1% or less of the aspartate kinase 
activity originally present, and this small amount of 
activity probably results from cross-contamination with 
uncleaved protein. A fragment retaining normal levels of 
aspartate kinase activity, however, can be prepared 
genetically by deleting the carboxy-terminal 45% of the 
polypeptide.’ In B. subtilis, aspartate kinase and 
homoserine dehydrogenase are separate proteins. By 
alignment of the amino acid sequences of these two pro- 
teins with that of the bifunctional protein from E coli, it 
could be shown that the boundary between the two 
enzymatic domains in the bifunctional protein is a short 
segment between Phenylalanine 460 and Isoleucine 
466. 

From all of these results it can be concluded that, if 


properly detached, each domain of this bifunctional pro- 
tein from E coli remains folded and enzymatically active. 
The cleavage by an endopeptidase within the exposed 
loop in domain 1, which is responsible for aspartate 
kinase, causes it to unfold and the unfolded amino-ter- 
minal fragment to fall away. Domain 2, which is respon- 
sible for homoserine dehydrogenase, remains folded and 
active after the cleavage. If the point of cleavage by glu- 
tamyl endopeptidase were a true boundary between two 
detachable domains, rather than an adventitious loop of 
polypeptide on the surface of the aspartate kinase, the 
aspartate kinase activity, which can readily be expressed 
by the genetically dissected protein, would have been 
unaffected. Aspartate kinase-homoserine dehydroge- 
nase, then, is an example of a protein that has domains 
that cannot be detached from each other at their bound- 
ary by cleavage with an endopeptidase. 

Proteins such as aspartate kinase-homoserine 
dehydrogenase belong to a class of proteins known as 
multienzyme complexes. A multienzyme complex is a 
protein that, although it is a single, discrete macromole- 
cule, is able to catalyze two or more enzymatic activities. 
Usually each of the enzymatic activities in one of these 
multienzyme complexes is expressed by its own unique 
domain within the folded polypeptide or one of the 
folded polypeptides that form the protein. Such an enzy- 
matic domain within a larger protein is a domain that is 
by itself independently responsible for a particular enzy- 
matic activity. The fact that the several enzymatic activi- 
ties are expressed respectively by several individual 
proteins in some species yet all of them are expressed by 
only one protein formed from one folded polypeptide in 
other species is sufficient evidence of an independent 
existence to conclude that the multienzyme complex is 
constructed from enzymatic domains. Proteins con- 
structed from enzymatic domains presumably arose as 
the result of the fusion of the individual genes that 
encoded the unfused ancestors of those domains. Many 
artificial fusions of two genes to produce chimeric pro- 
teins have been performed, and the products that result 
from these artificial fusions seem to be little affected 
functionally.'7%17! 

A paradigm for a protein containing enzymatic 
domains is the CAD multienzyme complex in animal tis- 
sues that comprises a single folded polypeptide about 
2220 aa in length'” responsible for the enzymatic activi- 
ties of carbamoyl-phosphate synthase (glutamine 
hydrolysing), aspartate carbamoyltransferase, and dihy- 
droorotase.!” The first enzymatic reaction has two steps, 
the production of ammonia from the hydrolysis of gluta- 
mine at the active site of a glutaminase and the synthesis 
of carbamoyl phosphate from the resulting ammonia at 
the active site of a carbamoyl-phosphate synthase 
(ammonia). Each ofthe four component enzymatic reac- 
tions carried out by the intact complex from animals is 
carried out by a different discrete protein in E. coli. The 
amino acid sequences of the four separate bacterial pro- 
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teins can be aligned with four consecutive regions in the 
amino acid sequence of the multifunctional protein from 
Mesocritus auratus; glutaminase with amino acids 2-355 
(40% identity, 1.3 gap percent),'””'”* carbamoyl-phos- 
phate synthase (ammonia) with amino acids 397-1440 
(40% identity, 1.1 gap percent),'””'™ dihydroorotase’” 
with amino acids 1457-1785 (20% identity, 3.5 gap per- 
cent), and aspartate carbamoyltransferase with amino 
acids 1921-2225 (44% identity, 1.6 gap percent).'”’ These 
regions are enzymatic domains 1 through 4 in the CAD 
multienzyme complex, respectively. 

The fact that dihydroorotase has sustained so much 
more replacement suggests that even within the same 
polypeptide different domains can suffer replacement at 
different rates. In fact, rates of change can differ so much 
that one domain in a multienzyme complex can become 
defunct even as others retain their full function. In the 
CAD multienzyme complex from yeast, the dihydrooro- 
tase domain, although its amino acid sequence is still 
able to be aligned, has lost the ability to catalyze its enzy- 
matic reaction." A similar loss of function has occurred 
during the evolution of the fructose-2,6-bisphosphate 
2-phosphatase domain of yeast 6-phosphofructo-2-kinase, 
but its enzymatic activity can be restored by mutating 
Serine 404 to histidine.!”® 

Because the amino acid sequences of the four dis- 
crete bacterial proteins responsible for the four enzy- 
matic reactions catalyzed by the CAD multienzyme 
complex from animals can be aligned with the amino 
acid sequences of its four enzymatic domains, it follows 
that the tertiary structure of each domain in the animal 
protein must be superposable on the tertiary structure of 
the corresponding bacterial protein. Consequently, each 
domain in the animal protein must be a compact, inde- 
pendently folded unit, and these units must be strung 
together consecutively by the continuity of the polypep- 
tide. This conclusion is supported by the fact that the 
enzymatically active domains responsible respectively 
for dihydroorotase’”'” and aspartate carbamoyltrans- 
ferase’’’ can be detached either genetically or by cleav- 
age of the protein with endopeptidases at the boundaries 
of the domains. During the digestion with endopepti- 
dases, however, the activity of carbamoyl-phosphate 
synthase (glutamine-hydrolysing) is lost and can be asso- 
ciated with none of the fragments smaller than 1700 aain 
length. In situations such as this, digestion of one 
domain, for example, the carbamoyl-phosphate syn- 
thase domain, at some point on its surface could cause it 
to unfold and make the polypeptide much more suscep- 
tible to cleavage by endopeptidases in a region forming 
the boundary between that unfolded domain and a 
neighboring properly folded domain. An example of such 
a pruning of an unfolded segment of polypeptide from a 
properly and compactly folded protein by digestion with 
an endopeptidase occurred during the production of 
hybrids of different portions of micrococcal nuclease 
from Staphylococcus.'*° 
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Anthranilate synthase, CTP synthase, phosphoribo- 
sylformylglycinamidine synthase, GMP synthase, imida- 
zole glycerol phosphate synthase, glutamine- 
fructose-6-phosphate transaminase (isomerizing), and 
aminodeoxychorismate synthase, like the carbamoyl- 
phosphate synthase (glutamine hydrolysing) incorpo- 
rated into the CAD multienzyme complex, all contain an 
enzymatic domain responsible for producing ammonia 
from glutamine by hydrolysis. The domain can be either 
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a separate folded polypeptide'®' or an enzymatic domain 
in a longer polypeptide.'®” Because the sequences of 
these various enzymatic domains catalyzing the hydrol- 
ysis of glutamine are homologous, it follows that they 
share acommon ancestor that in its day was a separate, 
independent protein, presumably a glutaminase. The 
offspring of this common ancestor were separately 
incorporated into the various multienzyme complexes 
in which they are now found. Although each of these 
complexes catalyzes a quite different reaction, each uses 
the ammmonia supplied by the respective glutaminase 
domain as a substrate. 

The crystallographic molecular model of the 
bifunctional enzyme from Leishmania major responsible 
for dihydrofolate reductase and thymidylate synthase 
(Figure 7-14)"° illustrates the independent existence of 
its two enzymatic domains. Domain 1, which is respon- 
sible for dihydrofolate reductase, comprises the folded 
polypeptide from Serine 23 to Arginine 230; and 
domain 2, which is responsible for thymidylate synthase, 
that from Histidine 234 to Valine 520. The respective 
active sites are identified by the NADPH and the 
10-propargyl-5,8-dideazafolate. Each enzymatic domain 
is readily superposed on the crystallographic molecular 
model of the corresponding monofunctional enzyme 
from E. coli, and their respective amino acid sequences 
can be aligned with those of dihydrofolate reductase 
(26% identity, 3.8 gap percent) and thymidylate synthase 
(53% identity, 1.4 gap percent) from E. coli. In the bifunc- 
tional protein, the two enzymatic domains, although 
folded separately and once separate unassociated pro- 
teins, have nevertheless become intimately associated 
with each other at the interface between themselves. 

In Aspergillus nidulans there is a multienzyme 
complex catalyzing 3-dehydroquinate synthase, 3-phos- 
phoshikimate 1-carboxyvinyltransferase, shikimate 
kinase, 3-dehydroquinate dehydratase, and shikimate 
dehydrogenase. Although the individual enzymatic 
domains responsible for dehydroquinate dehydratase 
and 3-dehydroquinate synthase could be genetically 
detached as enzymatically active proteins, the enzymatic 
domain responsible for 3-phosphoshikimate 1-car- 
boxyvinyltransferase was active only when attached to 
the neighboring domain responsible for 3-dehydro- 
quinate synthase.'?* This result suggests that as time 
passes following their fusion, two domains may associate 
with increasing intimacy as a specific interface between 
them evolves, like that in dihydrofolate reductase- 
thymidylate synthase (Figure 7-14), and in the end, they 
may require each other’s presence to fold properly as 
that interface becomes more and more extensive. 

In most cases, the enzymatic domains gathered 
into amultienzyme complex all carry out reactions in the 
same metabolic pathway. The five enzymatic domains 
in the multienzyme complex from A. nidulans, the four 
enzymatic domains in the CAD multienzyme complex, 
and the two domains in dihydrofolate reductase- 


thymidylate synthase catalyze successive reactions in the 
biosynthesis of chorismic acid, orotidine 5’-phosphate, 
and thymidine 5’-phosphate, respectively. Aspartate 
kinase-homoserine dehydrogenase catalyzes the first 
and third steps in the biosynthesis of homoserine. It is 
common in prokaryotes to find the enzymes catalyzing 
the reactions of a metabolic pathway gathered together 
in an operon. Such a gathering may have preceded the 
gene fusions producing multienzyme complexes and 
facilitated those fusions by placing the genes for the 
ancestors of the enzymatic domains adjacent to each 
other in the genome.” There are, however, examples of 
single enzymatic domains responsible for only one enzy- 
matic reaction but inserted into larger proteins. The 
domain responsible for protein-tyrosine-phosphatase in 
eukaryotes is a compact enzymatic domain!” that is 
fused into a large array of different proteins. 

In prokaryotes and plants,!?”!?® the seven enzymes 
and the acyl carrier protein responsible for the synthesis 
of fatty acids from acetyl-SCoA and malonyl-SCoA are 
discrete proteins that can be separated and individually 
purified, but in fungi and animals all of these activities 
are expressed by a single multienzyme complex. In fungi, 
the complex is constructed from two folded polypeptides 
that are encoded by different genes and are completely 
different in sequence from each other. (IT! Their lengths 
are 1890 and 1980 aa, respectively.'®”!” The fatty-acid 
synthase from animals, however, is constructed from 
only one polypeptide, 2440 aa in length.'” All seven of 
the enzymatic activities and the acyl carrier protein are 
located on the single polypeptide comprising the animal 
enzyme, and the domains responsible for each have been 
identified in the amino acid sequence.” Animal fatty- 
acid synthase has also been dissected both genetically 
and with the endopeptidase kallikrein!” to produce sev- 
eral detached domains that are able to catalyze the enzy- 
matic reactions assigned to them. 

The order in which these enzymatic domains occur 
on the single polypeptide of the animal fatty-acid syn- 
thase is unrelated to the orders in which they appear on 
the two unique polypeptides from fungi.” On the basis 
of this fact, it has been concluded that all or most of the 
gene fusions that produced the animal protein and the 
fungal protein, respectively, must have occurred as inde- 
pendent events after the lineages of these two kingdoms 
diverged from their common ancestor. These separate 
processes would be ones in which each enzymatic 
domain has been shuffled into a larger protein. 
Nevertheless, the individual domains, even though fused 
in different orders, are still homologous to each other 
because the amino acid sequence of the one responsible 
for a given activity in the fungal enzyme can be aligned 
with that of the one responsible for the same activity in 
the animal enzyme. ™* 

There are anumber of multienzyme complexes that 
are responsible for the biosynthesis of antibiotics in var- 
ious fungi and bacteria. For example, the single folded 
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polypeptide 3131 aa in length'® responsible for the 
biosynthesis of enniatin in Fusarium oxysporin catalyzes 
all of the enzymatic reactions required to produce this 
cyclic hexadepsipeptide,'” and the multienzyme com- 
plex containing four folded polypeptides (3587, 3587, 
1274, and 240 aa) responsible for the biosynthesis of sur- 
factin in B. subtilis catalyzes all of the enzymatic reac- 
tions required to produce this cyclic octadepsipeptide.'” 

The multienzyme complex!” responsible for the 
biosynthesis of the polyketide 6-deoxyerythronolide B, 
which is the lactone at the hydroxyl on carbon 13 of the 
fatty acid 


is composed of three distinct but related polypeptides, 
3200-3600 aa long. The synthesis of the complete 
molecule of 6-deoxyerythronolide B procedes by the suc- 
cessive condensation of six molecules of (2S)-methyl- 
malonyl-SCoA onto a molecule of propionyl-SCoA. After 
each of the six Claisen condensations, the resulting 
ketone either remains untransformed (carbon 9) or is 
reduced to the alcohol (carbons 3, 5, 11, 13), which either 
remains untransformed or is dehydrated and reduced to 
the alkane (carbon 7). The entire sequence of reactions at 
each round, from condensation to alkane, requires the 
successive participation of an acyl carrier protein present 
as a domain as well as five separate enzymatic domains: 
an [acyl-carrier-protein] S-acyltransferase, a 3-oxoacyl- 
[acyl-carrier-protein] synthase, a 3-oxoacyl-[acyl- 
carrier-protein] reductase, a3-hydroxyacyl-[acyl-carrier- 
protein] dehydratase, and an enoyl-[acyl-carrier-pro- 
tein] reductase. On its three polypeptides erythronolide 
synthase has 22 enzymatic domains and 6 acyl carrier 
proteins. 

The elongating substrate is passed along the 
assembly line from the first acyl carrier protein to the 
sixth acyl carrier protein through six successive stations. 
At each station, it is operated on by the enzymatic 
domains assembled at that station.’ After it has been 
processed at the last station, the product 6-deoxyery- 
thronolide is released from the last acyl carrier protein by 
an acyl-[acyl-carrier-protein] hydrolase, the last of the 22 
enzymatic domains. When one of the domains at a par- 
ticular station is inactivated or when one or more 
domains are experimentally added to a particular sta- 
tion, the product of the multienzyme complex changes at 
the position produced by that station to reflect its altered 
capacity. When the last station is deleted, a fatty acid lac- 
tone shorter by one C; unit is produced.” The enzymatic 
reactions catalyzed by this large multienzyme complex 
are homologous to those catalyzed by fatty-acid syn- 
thase. Fatty-acid synthase, however, because each of its 
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successive steps passes through the complete sequence 
of enzymatic reactions to produce the alkane at each 
stage, uses only one station for all of the reactions rather 
than the six stations, one for each of its steps, used by 
erythronolide synthase. 

Coenzymatic domains, such as the acyl carrier pro- 
teins carrying 4’-phosphopantetheine that are incorpo- 
rated into fatty-acid synthase and erythronolide 
synthase, are domains to which coenzymes are cova- 
lently attached. The domains carrying acyl carrier pro- 
teins are domains because they are found as 
independent proteins in prokaryotes and plants; the 
domains carrying lipoic acid that are incorporated into 
2-oxo-acid dehydrogenase complexes are domains 
because they can be detached.””' Domains carrying 
biotin appear within the longer polypeptides of a 
number of different biotin-dependent carboxylases.”” 

A functional domain within a larger protein is a 
domain that is by itself responsible for a specific func- 
tion. Enzymatic domains are functional domains, but 
there are also functional domains that are not enzymatic. 
Examples of functional domains would be the domains 
responsible for binding to specific segments of DNA 
(Figures 6-46, 6-50, and 6-53), each of which is a com- 
ponent ofa larger protein responsible for controlling the 
expression of the gene adjacent to the segment of DNA 
recognized by the binding domain. Each of the other 
domains in the larger protein is responsible for a func- 
tion essential to this control. For example, in each ofthe 
proteins that controls genes in response to steroid hor- 
mones such as estrogen, testosterone, progesterone, cor- 
tisone, and aldosterone, one of these other domains is 
responsible for binding the respective hormone.” There 
are many examples of domains such as these that are 
responsible for binding a ligand and that are part of a 
larger protein. Examples are the two domains responsi- 
ble for binding cyclic AMP in cyclic AMP-dependent pro- 
tein kinase*™ and the domain responsible for binding 
flavin mononucleotide and stabilizing its semiquinone 
radical in sulfite reductase.” When the amino-terminal 
240 aa of 3-phosphoshikimate 1-carboxyvinyltransferase 
from E. coli, which form a compact globular domain in 
the crystallographic molecular model of the intact pro- 
tein (427 aa), was expressed separately, it folded to 
form a structure capable of binding shikimate 3-phos- 
phate.?”” 

The domains discussed so far are clearly capable of 
independent existence or are descended from ancestors 
that were. There are, however, domains in proteins that 
either cannot be detached or that are not identified with 
an independent function. These domains often were 
joined together so long ago that they have become com- 
pletely dependent on each other both structurally and 
functionally. Nevertheless, it is possible to conclude that 
they once did have an independent existence because 
they recur in a number of extant proteins. Examples of 
such recurring domains would be domain 1, the antipar- 


allel B barrel,” and domain 2, the doubly wound paral- 
lel Bsheet, in ferredoxin-NADP* reductase (Figure 
7-12), which are also found as domains in a number of 
different proteins. A recurring domain is a domain that 
is folded with a tertiary structure that can be superposed 
on the tertiary structures of other domains in other pro- 
teins of otherwise entirely different structure. A recurring 
domain is a compact structure used in more than one 
distinct situations. Because of its recurrence in different 
surroundings, there is little doubt that a domain of this 
type had at one time an independent existence. 

Pyruvate kinase is one of the more informative 
examples of a protein built from recurring domaine 
Domain 1 of the protein is an a-helically wound, parallel 
Bbarrel that is superposable on the entire folded 
polypeptide of triose-phosphate isomerase. Inserted into 
the loop between the third £ strand and the third œ helix 
of domain 1 is domain 2, which is a Greek key, antiparal- 
lel ß barrel.”"’ Domain 3, which follows in the sequence 
of the polypeptide the complete elaboration of domain 1, 
can be superposed on half of the doubly wound, parallel 
Bsheet found in L-lactate dehydrogenase. (Figure 
7-11).2!° Each of these three structures is a recurring 
domain. In galactose oxidase from Dactylium den- 
droides, domain 2 is a A propeller (Figure 6-13) that is 
flanked by domain 1, an eight-stranded jelly roll, antipar- 
allel 8 barrel,”"’ and domain 3, a bundle of seven antipar- 
allel 8 strands that has the topological arrangement of an 
immunoglobulin domain.”” All three of these structures 
are also recurring domains. 

Recurring domains occasionally appear to be 
associated with a particular role. For example, benzoate 
4-monooxygenase, glucose oxidase, cholesterol oxidase, 
and glutathione-disulfide reductase contain a recurring 
domain about 160 aa in Jeng 232 In all of these 
enzymes, the domain serves to bind tightly an integral 
flavin coenzyme, and it has been referred to as the “FAD- 
binding domatn" 7 

Some secondary structures, such as a p propeller 
(Figure 6-13)? ora B helix, are self-contained and usu- 
ally occur as independent structural entities in a pro- 
tein. Because such secondary structures recur in many 
proteins, they could be considered recurring domains, 
and they usually do seem to be independent isolated 
units in a larger protein in which they occur.” 

An internally repeating domain is a member of a 
set of consecutive segments within the same polypep- 
tide, each homologous in amino acid sequence to the 
other members of the set or each folded in a tertiary 
structure superposable on the tertiary structures of the 
other members of the set. An example of a set of such 
internally repeating domains are the 12 domains, four in 
each of the three detachable domains that compose 
immunoglobulin G (Figure 7-13). Each of these 12 inter- 
nally repeating domains is a seven-stranded barrel of 
antiparallel B strands. Each is superposable on all the 
others and shares statistically significant similarities in 


amino acid sequence with some of the others. The two 
identical long polypeptides in an immunoglobulin G, the 
heavy chains, each contain four of these domains; and 
the two identical short polypeptides, the light chains, 
each contain two. Polypeptides containing such inter- 
nally repeating domains are quite common. About 10% 
of polypeptides 200 amino acids in length contain inter- 
nally repeating domains but the frequency rises steadily 
to 80% for polypeptides 2000 amino acids in length.?"® 

Internally duplicated domains are internally 
repeating domains that occur only twice in the same 
polypeptide. Internally duplicated domains arise from 
the internal duplication of a gene encoding a smaller 
protein. An internal duplication is a gene duplication in 
which the duplicated genes end up immediately adjacent 
to each other so that when they are transcribed and 
translated, the duplicated amino acid sequences remain 
attached consecutively to each other in the same 
polypeptide. Because they are the products of internal 
duplication, the single, unrepeated ancestral amino acid 
sequence and tertiary structure of each of these dupli- 
cated domains must have existed on its own at some time 
in the past, before the duplication occurred. As with a 
gene duplication producing two separate proteins, an 
internal duplication arises in the genome of one individ- 
ual and then spreads by genetic drift over the whole pop- 
ulation. As with gene duplication in general, internal 
duplications arise often but only rarely spread over the 
whole population. Following the gene duplication and its 
spread and fixation by genetic drift, the two internally 
repeating domains begin to evolve separately. 

The two halves of the doubly wound, parallel 
p sheet as it presently occurs in L-lactate dehydrogenase 
(Figure 7-11) can be superposed upon each other (Figure 
7-15). It has been proposed that the complete doubly 
wound, parallel 8 sheet arose itself from a gene duplica- 
tion in which the two segments of polypeptides encoded 
by the duplicated gene remained consecutively attached 
to each other and then began to evolve independently 
but within the same protein. That this did happen is sup- 
ported by the fact that recurring domains are found in 
the crystallographic molecular models of pyruvate 
kinase?" and phosphoglycerate kinase that superpose on 
only half of the doubly wound, parallel 6 sheet from 
L-lactate dehydrogenase.””” The lineages leading to these 
two shorter domains presumably diverged before the 
gene duplication that produced the common ancestor of 
the larger. 

The eight-stranded, «-helically wound parallel 
p barrel that forms the entire molecule of 1-(5-phospho- 
ribosyl)-5-[(5-phosphoribosylamino)methylidineamino] 
imidazole-4-carboxamide isomerase from Thermotoga 
maritima has a fold typical of this structure, which is 
usually treated as a single domain (Problem 7-10B). 
Nevertheless, the amino-terminal half of its crystallo- 
graphic molecular model can be superposed on the car- 
boxy-terminal half with a root mean square deviation of 
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0.21 nm, and the amino acid sequences of the two halves 
can be aligned structurally with a percentage of identity 
of 23%." A similar superposition and alignment can be 
performed with the two halves of the a-helically wound, 
parallel 6 barrel of imidazole glycerol phosphate syn- 
thase from the same bacterium.”!’?'* Unlike the two 
halves of the doubly wound, parallel Bsheet (Figure 
7-15), there are no examples of proteins in which half of 
an ac-helically wound, parallel ßbarrel is found. 
Nevertheless, these superpositions and alignments sug- 
gest that all «-helically wound, parallel ß barrels are also 
the product of an internal duplication. 

The serine endopeptidases,” thiosulfate sulfur- 
transferase,” carbamoyl-phosphate synthase from 
E coli,” methionyl aminopeptidase from E coli,” and 


diaminopimelate epimerase from Haemophilus influen- 
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by one internal duplication. In UDP-glucose 6-dehydro- 
genase from Streptococcus pyogenes, however, the two 
domains of an internal duplication (123 and 93 aa) are 
separated from each other by another domain of 
177 aa.“ Although the bifunctional enzyme phosphori- 
bosylanthranilate ` isomerase/indole-3-glycerol-phos- 
phate synthase has two consecutive enzymatic domains 
that are superposable,”” the two enzymes probably 
evolved separately from a common ancestor before 
being fused. 

Aside from the alignments of the duplicated amino 
acid sequences or superpositions of the duplicated terti- 
ary structures, there are other observations supporting 
the conclusion that duplicated domains at one time had 
independent existence. The internal duplication in 
mammalian hexokinase produced two independent 
enzymatic domains, each containing an active site and 
each superposable on the other and on the unduplicated 
enzyme from fungi.?””” The two internally duplicated 
domains of ovotransferrin”” can be detached by diges- 
tion with endopeptidase, and the resulting amino-termi- 
nal domain can be crystallized and shown to have the 
same structure it had in the intact protein.” The aspar- 
tic endopeptidase from retroviruses is formed from two 
identical polypeptides,” but those from eukaryotes are 
formed from a single polypeptide containing an internal 
duplication of the structure assumed by each polypep- 
tide in the viral protein.” Each domain of the enzyme 
from eukaryotes is superposable on one of the folded 
viral polypeptides. The two halves of porcine aspartic 
endopeptidase were expressed separately and, when 
mixed together, folded to produce an enzymatically 
active protein formed, as is the retroviral enzyme, from 
two folded polypeptides” rather than two internal 
duplications. 

Many proteins contain more than two internally 
repeating domains. Serum albumin*””* is composed of 
three internally repeating domains; human interstitial 
retinol-binding protein, of four” gelsolin, of six; 
human placental ribonuclease inhibitor,” hemocyanin 
from Octopus dofleini,” and granulin,”® of seven; and 
the polymeric globin from Artemia, of nine.” Such gene 
multiplications are usually not produced by several his- 
torically distinct duplications but arise when an initial 
gene duplication then catalyzes the further multiplica- 
tion of the gene during successive rounds of recombina- 
tion. Sometimes the same protein from different species 
has different numbers of internally repeating domains. 
Dihydrolipoyllysine-residue acetyltransferase from 
E. coli has three consecutive lipoamide domains; the 
enzyme from rat, two; and the enzyme from S. cerevisiae, 
one.” An example of a very recently multiplied gene is 
the one encoding prepromagainin from Xenopus laevis 
in which the identical sequence of 46 aa repeats five 
times*"' in the same polypeptide with the replacement of 
only one amino acid in only one of the repeats. In con- 
trast to such proteins containing amino acid sequences 


multiplied so recently that they can be readily aligned*” 
are proteins in which the internally repeating domains 
diverged so long ago that their secondary structures, 
although obviously related, have rearranged signifi- 
cantly.“ 

Nebulin and spectrin are examples of proteins with 
even larger numbers of internally repeating domains. 
Nebulin is a long protein (6669 aa) composed almost 
entirely (the first 6480 aa) of 178 internally repeating 
domains each 30-32 aa long.” Spectrin is a protein 
composed of 38 internally repeating domains consecu- 
tively occurring in its two unique folded polypep- 
tides.” 277 The folded domains sit like beads on a wire to 
create a long, somewhat flexible protein (Figure 
7-16A).°°"8 Each domain of 106 aa?” is an antiparallel 
coiled coil of three «helices (Figure 7-16B) 7° 
Spectrin offers an excellent example of the absence of a 
correlation between locations at which cleavages with 
endopeptidases occur upon the surface of a native pro- 
tein and the boundaries between its domains. Of the 15 
cleavages of native spectrin produced by trypsin,” only 
four occur that are even near the boundaries of the 
domains.” This fact is not surprising, given that the 
junctions between the domains are continuous o helices 
(Figure 7-16B). 

The internal repeats in a protein, if the multiplica- 
tions of the gene that produced them have occurred 
recently enough, can be recognized on a dot matrix 
(Figure 7-2) in which the amino acid sequence of the 
entire protein is compared to itself. Such a dot matrix of 
a protein with internally repeating domains contains a 
set oflines parallelto the central diagonal ofidentity. The 
distance between the lines is equal to the length of the 
internally repeating domains, and the number of lines is 
one less than the number of domains. In the dot matrix 
for the self-comparison of the amino acid sequence of 
human intestinal retinol-binding protein,” there are 
three lines parallel to the central diagonal and the dis- 
tances between them are 302-310 aa, and in that for 
human placental ribonuclease inhibitor,”°® there are six 
lines and the distances between them are 57 aa. 

The $ propeller (Figure 6-13) is a structure in which 
four antiparallel Bstrands form each blade and 6-8 
blades form the intact unit. Each blade is superposable 
on each of the others” so an argument might be made 
that each blade represents an internally repeating 
domain,”*” but there are no examples of such a small 
structure having independent existence. There are a 
number of other repeating structures that are too small 
to fold on their own TZ" and consequently should not 
be considered to be domains. Often short repeating 
sequences such as those in dragline silk from spiders” 
or antifreeze proteins from Tenebrio molitor”' have been 
multiplied to produce a protein that is fibrous or that 
must conform to a repeating molecular structure such as 
a crystal of ice but that is not formed from a string of 
independent globular tertiary structures as is spectrin. 
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Figure 7-16: Internal repeats in spectrin. (A) 
Hypothetical model of the isoform of spectrin from 
human erythrocytes.” A repeating pattern was 
detected in the amino acid sequences of the two dif- 
ferent polypeptides (Naa, = 2430, Naa g = 2200) com- 
posing this protein. The repeating pattern is 106 aa 
in length and occurs 22 times in the o polypeptide”? 
and about 18-20 times in the ß polypeptide.” From 
numerous physical measurements, it was con- 
cluded that each repeating domain is a bundle of 
three «helices and that the bundles are strung 
together as shown. Reprinted with permission from 
ref 245. Copyright 1984 Nature. (B) Skeletal repre- 
sentation of the crystallographic molecular model 
(Bragg spacing > 0.2 nm) of the 16th and 17th inter- 
nally repeating domains of the o isoform of spectrin 
from brain of Gallus gallus.””° Complementary DNA 
encoding the sequence of the protein from Histidine 
1772 to Alanine 1982 was expressed in E. coli. The 
resulting polypeptide folded to form the two 
domains, each of which is an antiparallel coiled coil 
of three «helices. The only feature of the structure 
unforeseen in the proposal of panel A was that the 
last o helix of a preceding domain would be contin- 
uous with the first œ helix of the next domain. This 
drawing was produced with MolScript.*” 


The proteins discussed so far are simple multiples 
of internally repeating domains, but other proteins have 
internally repeating domains attached to themselves and 
then to other domains. Sometimes one or the other of the 
odd domains appears to have evolved from the same 
common ancestor as the internally repeating domains 
but has undergone a few more additions, deletions, or 
rearrangements of its secondary structure,” an 
observation suggesting that two gene duplications 
occurred at remarkably different times, but usually the 
odd domain or odd domains are entirely different. For 
example, pyruvate oxidase from Lactobacillus plantarum 
has two internally repeating domains, each 200 aa long, 
coupled to a recurring FAD-binding domain in the same 
polypeptide.” The extracellular portion of mannose- 
6-phosphate receptor has 15 contiguous internally 
repeating domains coupled to a membrane-spanning 
segment and an intracellular domain.?® 

Titin is possibly the longest protein built from a 
single polypeptide. It is a protein greater that 1 um in 
length that is found in vertebrate muscle.’ The 
amino terminus of its polypeptide is embedded in the 
Z disc, the carboxy terminus of its polypeptide is embed- 
ded in the M line, and its elasticity allows it to shorten 
and lengthen as the muscle contracts and relaxes.?°%?6° 
As does spectrin (Figure 7-16), titin achieves its remark- 
able length by using internal repeats. It contains 244 


internally repeating domains, each about 100 aa long, 
that account for 90% of its amino acid sequence of 
26,900 aa. Unlike spectrin, however, these internal 
repeats come in two types, immunoglobulin repeats?” 
and fibronectin type III repeats. Also, unlike the inter- 
nally repeating domains of spectrin, these two types of 
internally repeating domains are widely recurring within 
a broad class of mosaic eukaryotic proteins that are cob- 
bled together from a particular collection of promiscu- 
ous, modular domains. In the amino acid sequences of 
these proteins, segments have been observed that can be 
aligned with other segments within the same protein as 
well as segments in other proteins.” These recurring 
domains can appear many times in the same protein as 
internally repeating domains*” or they can appear in 
combination with other different recurring domains. 
Because the proteins that contain them seem to have 
resulted from recent, remarkably active domain shuffling 
and because their amino acid sequences can usually be 
aligned readily with those of the other members of their 
type, these domains are usually considered to be mem- 
bers of a unique group and are referred to as modular 
domains. 

A modular domain is a domain that recurs fre- 
quently within a group of mosaic eukaryotic proteins 
containing internal repeats of that domain, mixtures of 
other modular domains, or both internal repeats and 
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mixtures. The modular domains in such mosaic proteins 
are usually recognized by sequence alignment, a fact 
indicating that the proteins containing them have arisen 
from recent genetic events. The modular domains in 
these proteins can be readily assigned to one of the many 
types that have been observed. A few of these types are 
listed in Table 7-7. 

A drawing of the crystallographic molecular model 
of a cohesin domain is presented in Figure 6-21. 
Although the mosaic proteins constructed from modular 
domains often contain only one or two types of domain 
with only a few internal repeats,****!”*!* some have four 
or more types, one or more of which can repeat a number 
of times (Table 7-8). 

Several of the types of modular domains, for exam- 
ple, calcium-binding EF hand, leucine-rich repeats, EGF 
domains, and ankyrin repeats, are too short (Table 7-7) 
to fold on their own and are almost always repeated two 
or more times to produce a large enough structure to 
permit the polypeptide to fold. Together, in consecutive 
order, they form structures in which small compact 
units, each formed by one of the repeats and each of the 
same structure, are stuck together one against the 
next.””™ It is the interfaces between the units that bury 
sufficient numbers of hydrogen-carbon bonds to make 
the structures stable. An example of this strategy is the 
B helix in the antifreeze protein from T. molitor, in which 
each turn is an internal repeat of only 12 aa.” Other 
short modular domains, clearly related by alignment of 
their sequences, assume different structures under dif- 
ferent circumstances.’ 

Examples of the group of proteins that are mosaics 


Table 7-7: Examples of Types of Modular Domains 
Distributed among Mosaic Eukaryotic Proteins 


type of domain length“ (aa) structure? 
EF hand”? 40 aLa 
immunoglobulin”? 100 By 
leucine-rich repeat””?”®° 30 Do 
RNA recognition motif?! 80 Baß,aß 
EGF?#?286 50 RM(Ccy3_4) 
cohesin??72% 140 Bo 
ankyrin”®°" 40 Bo, 
C2274291 120 Bs 
SH 100 Bob, off 
SH3296:297 60 Bs 
Kringle***3” 80 RM(Ccys) 
SAND?! 80 Bobo 
pleckstrin?? °° 100 Boo 
fibronectin type DI 50 Bs 
fibronectin type IP 60 Bob 
armadillo308309 50 Oy 
fibronectin type [1183431032 90 By 
START? 200 (830% B50 
hemopexin*!*3!6 200 ` four-bladed ß propeller 


“Approximate mean to nearest 10. Secondary structures in the order in which 
they appear: a, œ helix; ß, B strand; L, loop; RM, random meander; Ccy, cystine. 


Table 7-8: Extreme Examples of Proteins Assembled from 
Modular Domains?"52!7318 


protein order of modular domains“ 

factor XII F2-EGF-F1-EGF-Kr-Endo’ 
thrombospondin I Ths-CC-PcC-(Prop)3-(EGF)3-(EF)7-Ths 
collagen VI (VWA) -Co-(vWA)>-ST-F3-BPTI 
aggrecan (Li)2-KS-Ch1-Ch2-EGF-Lec-Comp 
perlecan Per- (EGF) 4-Ig-(EGF) 19-Igjg- [LaC-(EGF)].-LaC 
btk kinase Plk-SH3-SH2-Kin 
phospholipase Cy PIk-PLC-PY-(SH2),-SH3-PLC 


p120 GTPase activator SH2-SH3-SH2-Plk-GAP 


The order from amino terminus to carboxy terminus in which the modular 
domains are attached to each other. "Modular domains (see Table 7-7): F2, 
fibronectin type II; F1, fibronectin type I; Kr, kringle; CC, coiled coil; PcC, carboxy- 
terminal procollagen; Prop, properdin; EF, calcium-binding EF hand; vWA, 
domain A of von Willebrand factor; ST, serine-threonine-enriched: F3, 
fibronectin type III; BPTI, bovine pancreatic trypsin inhibitor; Li, link protein; KS; 
keratan sulfate binding; Ch, chondroitin sulfate binding (types 1 and 2); Lec, 
lectin; Comp, complement control; Ig, immunoglobulin; LaC, carboxy-terminal 
laminin; Plk, pleckstrin, PY, phosphotyrosine-containing SH2-binding. Domains 
specific to the particular protein: Endo, endopeptidase; Ths, thrombospondin- 
specific; Co, (Gly-X-Y),, repeat of collagen; Per, perlecan; Kin, kinase; PLC, phos- 
pholipase C; GAP, GTPase activator. 


of modular domains are regulatory kinases and phos- 
phatases, proteins of the extracellular matrix, and 
endopeptidases of the coagulation system. Within one of 
these proteins, the modular domains, which are widely 
recurring, are usually attached to one or more domains 
that are specific to the function of that protein (Table 
7-8). Phosphoinositide phospholipase Cöl from Rattus 
norvegicus is a paradigm of such a mosaic protein. It con- 
tains, in order from the amino terminus, a pleckstrin 
domain, four consecutive EF hands, a catalytic domain 
responsible for the phospholipase activity, and a C2 
modular domain (Figure 7-1 7) 74924325 

Many of the types of modular domains, for exam- 
ple, the immunoglubulin, the EGF, the kringle, the 
fibronectin, and the hemopexin,?’061° are usually 
found on individual exons and are thought to have been 
distributed among their mosaics by exon shuffling. Exon 
shuffling is a genetic rearrangement in which an intact 
exon is coupled at one of its ends to the other end of 
another intact exon from elsewhere in the genome by a 
mistake in recombination that occurs at sites within the 
intron following the one exon and the intron preceding 
the other. The result of the shuffle is that the hybrid 
intron, containing the front end of one old intron and the 
back end of the other, now joins the two exons. This new 
intron is spliced away during the maturation of the mes- 
senger RNA so that in the protein the two amino acid 
sequences encoded by the exons are spliced together. 
Other types of modular domains, however, do not seem 
to be distributed by exon shuffling but by other types of 
genetic rearrangements. > 

In examining proteins formed from domains, two 
periods of construction can be discerned.” Modular 
domains and internally repeating domains, the amino 


acid sequences of which can be readily aligned, are the 
products of recent genetic rearrangements. Proteins 
containing recurring domains and internally duplicated 
domains the amino acid sequences of which cannot be 
aligned and the tertiary structures of which have drifted 
apart significantly are the products of ancient genetic 
rearrangements by processes that assembled the pro- 
teins common to all existing organisms. Many of the 
recent rearrangements have been produced by exon 
shuffling, but whether or not any of the early rearrange- 
ments also were produced by exon shuffling is 
unknown! 

When the tertiary structure of ferredoxin-NADP* 
reductase is examined (Figure 7-12), it seems possible to 
divide it reasonably into two domains. It is now known 
that both of these are recurring domains, but it has been 
argued that, even if this were not known, a judicious 
decision that these were distinct domains could still have 
been made by inspection of the crystallographic molec- 
ular model alone. In this sense, these are two structural 
domains. A structural domain has been defined as a 
“section of peptide chain that can be enclosed in a com- 
pact volume ... by a closed surface ..., and is character- 
ized by possession of two terminal points.”**’ These two 
terminal points are the point at which the polypeptide 
enters the compact volume enclosed by the surface and 
the point at which it exits. For example, phosphoglycer- 
ate kinase is constructed from two structural domains, 
each formed from one continuous length of polypeptide 
possessing two terminal points and clearly capable of 
being enclosed by continuous surfaces surrounding 
compact volumes.’ This definition does not possess a 
requirement for evidence of the independent existence 
of the domain. It could be argued that evidence will even- 
tually be gathered for the independent existence of each 
of the structural domains now designated. 

The difficulty with the definition of structural 
domains is that it is subjective. Even though the closed 
surfaces chosen are the ones that seem to be reasonable, 
in most cases, other choices, which usually produce a 
greater number of smaller domains, could be made that 
would satisfy the same definition. In fact, when this basic 
definition was used to derive a set of objective rules to 
divide any given crystallographic molecular model into 
domains, the tertiary structures of the 22 proteins exam- 
ined could be divided unambiguously into as few as two 
or as many as 10 structural domains. The number of 
domains so defined increased monotonically with the 
lengths of the respective polypeptides, which varied from 
58 to 450 aa.” The mean length of the polypeptide in 
these structural domains was about 50 aa, which seems 
too small to be an evolutionarily significant unit. Most of 
these irreducible units could be combined with one or 
more of their neighbors to produce larger structural 
domains. The relevance of these small segments either to 
the evolution of one of these proteins or to its structure is 
not apparent. 


pleckstrin domain (Table 7-7) of the protein (Methionine 1 to Histidine 


Figure 7-17: Crystallographic molecular model of phosphoinositide 
phospholipase C51 from R. norvegicus.'""°> The amino-terminal 


130) was expressed separately and crystallized. A skeletal representation of 


(Bragg spacing = 
der of the protein 


lecular model 


backbone of that crystallographic mo 
0.19 nm) is at the bottom of the figure 


325 The remain 


(Glycine 133 to Aspartate 756) was also expressed separately and crystal- 
lized. A skeletal representation of that crystallographic molecular model 
(Bragg spacing > 0.23 nm)” is presented above that of the pleckstrin 
domain. There was no electron density for Glycine 133 through Aspartate 
157 in the latter model. The representations of the two crystallographic 
molecular models are drawn at the same scale and arranged arbitrarily so 


that the carboxy-terminus of the pleckstrin domain is near the amino 


catalytic 


catalytic 


terminus of the rest of the molecule. The four EF hands (Table 7-7) com- 
prise positions 133-175, 176-211, 212-245, and 246-282. None hasa bound 
Ca“ even though the cation was present during crystallization, so they do 
not assume the paradigmatic structure, but except for the first one, which 
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is missing its first œ helix, each has an gœ helix, a loop, and an o helix. The 


domain responsible for the phospholipase activity (catalytic) comprises 


Aspartate 299 through Alanine 606. It has a disordered segment (445-484) 


that produced no electron density, and the active site is occupied by a mol- 


ecule of 1-D-myo-inositol-1,4,5-trisphosphate (thicker lines, upper left), a 
product of the enzymatic reaction, and a Ca” cation (gray circle). The C2 
modular domain (Table 7-7) comprises Tryptophan 625 throu: 


756. This drawing was produced with MolScript.?” 
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The intuitive impression persists, nevertheless, that 
the tertiary structures of most proteins, as revealed in 
their crystallographic molecular models, can be divided 
into two or more autonomous structural domains. It 
seems to be the case that anyone examining these struc- 
tures would make the same decision, but there is no way 
to verify this surmise. Some examples of proteins that are 
thought to contain structural domains are DNA-directed 
DNA polymerase I,” dihydrolipoyl dehydrogenase,” 
and the hemagglutinin glycoprotein of influenza virus.” 
An interesting example of a structural domain for which 
there is independent evidence of its existence occurs in 
catalase from Penicillium vitale. This protein has a struc- 
tural domain formed from the carboxy-terminal 160 aa 
in its sequence (amino acids 510-670)” that is missing 
entirely from bovine liver catalase, even though the 
two proteins are superposable throughout the other 
structural domain. Likewise, the carboxy-terminal struc- 
tural domain found in the two structurally superposable 
proteins phosphoribosylamine-glycine ligase and biotin 
carboxylase is missing from the otherwise superposable 
proteins glutathione synthase and p-alanine—p-alanine 
ligase.” 

As the number of crystallographic molecular 
models has increased, more and more of the domains 
that were originally designated structural domains have 
been found to be recurring domains. It is possible that if 
the crystallographic molecular models of all of the pro- 
teins were known, all structural domains would turn out 
to be recurring domains, and this latter fact would pro- 
vide the necessary evidence for their independent exis- 
tence. 

One indication that a structural domain does have 
independent existence is that its position relative to the 
rest of the protein shifts when crystallographic molecular 
models from different crystals of the same protein are 
compared. Lysozyme from bacteriophage T4 assumes 
five different structures in five different crystalline envi- 
ronments that differ from each other in the relative 
positions of its two structural domains "7! Such inde- 
pendently shifting domains that reorient within the 
same protein over milliseconds or seconds should be dis- 
tinguished from domains that change their orientations 
over millennia as related proteins diverge from each 
other during evolution. As two proteins that are derived 
from a common ancestor diverge with time, structural 
domains often shift positions relative to each other even 
though the internal structures of the domains them- 
selves remain superposable. Such evolutionarily shift- 
ing domains can be documented by superposing the 
crystallographic molecular models of related proteins. 
When the crystallographic molecular models of NADH 
peroxidase and glutathione reductase are compared, 
each of the four structural domains in these two related 
proteins superposes on its partner, but their relative 
positions in the two proteins are significantly shifted.” 
Significant shifts in the relative positions of the two inter- 


nally repeating domains of aspartic endopeptidases have 
been documented by comparisons of crystallographic 
molecular models of six different members of the 
group,” and the two structural domains in the related 
proteins ferredoxin-NADP* reductase from B. taurus and 
thioredoxin-disulfide reductase from E. coli,” although 
separately superposable, differ in their relative positions 
by 66°. 

One criterion that is often used as evidence for the 
independence of structural domains in a protein is that 
they unfold or fold independently. Separately unfolding 
domains are two or more regions of a protein that unfold 
independently of each other. Fibrinogen is a protein con- 
structed from two copies of each of three polypeptides 
that are combined in such a way that the intact protein 
contains a central detachable domain, domain E, and 
two identical peripheral, detachable domains, do- 
mains D. The two domains D are attached to domain E 
by ropes constructed from three-stranded coiled coils of 
a helices,***" and the two domains D can be detached 
from domain E by cleaving disordered regions in the 
coiled coils with endopeptidases. When fibrinogen is 
submitted to differential scanning calorimetry,* two 
clearly separated transitions can be observed (Figure 
7-18). These have been assigned to the melting of 
domains D and F, respectively.*”° 

The melting, or unfolding, of the separately unfold- 
ing domains of fibrinogen is an irreversible process 
under the conditions chosen,” but the unfolding and 
the refolding of a protein back to its native structure are 
often reversible processes, even in a calorimeter. 
Plasminogen is a protein composed of at least seven 
domains.” These are five kringles that repeat consecu- 
tively within the entire sequence and two additional seg- 
ments of polypeptide on each side of this pentuplication. 
Several of these domains or combinations of these 
domains can be detached and isolated separately. The 
reversible unfolding and refolding of five of these 
detached pieces could be followed by differential scan- 
ning calorimetry. These individual measurements could 
be combined to show that the rather complex, fully 
reversible calorimetric curve obtained with the intact 
protein was actually the sum of seven independent tran- 
sitions.*”’ It is also possible to observe the independent 


* A differential scanning calorimeter is used to measure the differ- 
ence in the absorption of heat, as the temperature is raised, 
between a solution containing a protein and an identical solution 
lacking the protein. Two cells, sample and reference, contain pre- 
cisely matched coils that introduce identical quantities of heat into 
each of them and establish a constant rate of temperature increase. 
The sample cell has an auxiliary coil that provides the additional 
heat necessary to keep its temperature exactly the same as that of 
the reference cell. The power supplied to the auxiliary heater is a 
measure of the excess heat absorbed by the sample, the endother- 
mic heat flow. A protein unfolds, or melts, as the temperature rises, 
and this transition proceeds with the absorption of heat. This 
absorption of heat is a convenient way to follow the progress of the 
unfolding. 
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Figure 7-18: Thermal melting of domains D and E of bovine fi- 
brinogen.*“* (A) Native intact fibrinogen. A solution (26 uL) of native 
intact fibrinogen (88 mg mL”) was introduced into the sample 
chamber of the differential scanning calorimeter, and endothermic 
heat flow (microjoules second”) into the sample in excess of the 
flow into an identical solution lacking the protein was recorded as 
a function of temperature (degrees Celsius). (B) A solution (25 uL) 
of a 2:1 molar mixture of the chromatographically purified do- 
mains D and E that had been detached with an endopeptidase was 
used as the sample at a final concentration of 101 mg mL”. 
(C) A solution (22 uL) containing only detached and chromato- 
graphically purified domain E was used as the sample at a final 
concentration of 47 mg mL. The scale of the two upper traces 
(microjoules second’) is 2.5 times that of the scale of the lower. 
Heating rate for all traces was 10 °C min”. Adapted with permis- 
sion from ref 346. Copyright 1974 National Academy of Sciences. 


unfolding and refolding of separate domains in the same 
protein by perturbing the equilibrium between folded 
and unfolded forms of each domain through the addition 
of a denaturant such as urea?“ or guanidinium chlo- 
ride*”’ rather than with heat. 

When there is no other independent evidence for 
the possibility that the structural domains in a particular 
protein at one time had an independent existence, 
attempts are often made to express one or more of these 
domains and demonstrate that they are able to fold inde- 
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pendently. An independently folding domain is a por- 
tion of a larger protein that is capable of folding by itself 
into a structure that is the same as the structure that por- 
tion assumes when it is within the larger protein. 
Experimentally, the portion of the protein thought to be 
an independently folding domain is detached either 
genetically or with an endopeptidase and shown to be 
able to fold properly on its own. If the fragment of the 
larger protein is an enzymatic domain, such as the bis- 
phosphatase domain of 6-phosphofructo-2-kinase/fruc- 
tose-2,6-bisphosphate 2-phosphatase,*” then its expres- 
sion as an enzymatically active protein permits one to 
conclude that it has folded while being expressed to 
assume a structure that is the same as the structure it 
assumes in the intact protein. 

The same conclusion can often be reached if the 
detached structural domain exhibits some function dis- 
played by the intact protein. A genetically detached 
structural domain of aspartate transaminase folded 
independently to produce a globular structure capable of 
binding pyridoxal phosphate,*' and a genetically 
detached internally repeating domain of human trans- 
ferrin binds iron with high affinity.” When a structural 
domain located at the interface between the two identi- 
cal subunits in glutathione reductase from E. coli was 
genetically detached, it folded and dimerized, presum- 
ably because the domain folded properly to create the 
face that participates in the dimerization of the native 
protein.’ Unfortunately, the situation is often more 
ambiguous. For example, only when the two genetically 
detached structural domains accounting for the entire 
structure of CLC-0 chloride channel from Torpedo cali- 
fornica are coexpressed does the protein display func- 
Von "7 

In the absence of functional activity, a determina- 
tion of its structure by nuclear magnetic resonance spec- 
troscopy, or a crystallographic molecular model"? 
how does one decide, even though it may display transi- 
tions characteristic of folding,” whether or not the 
detached fragment is folding to assume the structure it 
had in the intact protein? A fragment of the polypeptide 
comprising thermolysin from Bacillus thermoproteolyti- 
cus can be produced by cleavage with cyanogen bromide 
and purified by molecular-exclusion chromatography in 
its unfolded state. This fragment contains the last 111 aa 
of the intact protein (sequence positions 206-316) and 
can be induced to refold. The solution that results con- 
tains a monomeric, compact, globular protein?” that has 
an a-helical content, determined spectroscopically, 
close to that predicted from the crystallographic molec- 
ular model of thermolysin.*” The protein in this solution 
melts, but at a temperature 20° C below that at which 
native thermolysin, cut between amino acids 225 and 
226, melts. The refolded protein unfolds in solutions of 
the denaturant guanidinium chloride, but at concentra- 
tions of guanidinium half those at which the nicked ther- 
molysin unfolds. If the shorter polypeptide were folding 
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to assume the same structure it had in native ther- 
molysin, that structure is now much less stable. But the 
results are also consistent with it assuming a completely 
different conformation and the protein displays no func- 
tion that would indicate that it is properly folded. 

A claim that an independently folding domain has 
been produced requires an unambiguous demonstration 
that the domain can refold into the native conformation 
it assumes in the parent protein. For example, both 
kringle 4 of plasminogen, detached by digestion with an 
endopeptidase,*” and kringle 2 of tissue plasminogen 
activator, detached genetically,’ can be unfolded and 
refolded. The refolded detached domains bind lysine as 
they do when they are in their native conformation. The 
refolded kringle 2 also retains its affinity for plasminogen 
activator inhibitor 1. When kringle4 is unfolded, its 
cystines reduced to cysteines, and then refolded, the fact 
that it has regained its native conformation is demon- 
strated by the ability of this refolded structure to enforce 
the formation of only the properly paired cystines upon 
its exposure to oxygen.’ In this case, the proper pairing 
of the cysteines, located at distant positions in the amino 
acid sequence, is the result of their proper juxtaposition 
in a properly folded polypeptide. 

The segments of polypeptide linking domains 
together are of various types. Domains can be joined by 
flexible links such as those connecting the Fc and 
Fab domains of immunoglobulin G (Figure 7-13). The 
segments of polypeptide 35, 15, and 75 aa in length con- 
necting the four enzymatic domains of the CAD multien- 
zyme complex are rich in proline and glycine (30%) and 
the segments 27 and 24 aa in length connecting the three 
internally repeating lipoyl domains of dihydrolipoylly- 
sine-residue acetyltransferase from E coli are rich in pro- 
line and alanine (73%). All of these links should be 
unstructured and flexible. The amino acid segment 
-SKSSKEQKKKQK- connecting the two functional 
domains of initiation factor IF3 from E. coli has been 
shown to be randomly disordered in the intact protein in 
solution,*” and in the map of electron density for RNA 
recognition motifs from the Sex-lethal protein from 
Drosophila melanogaster, the segment connecting these 
modular domains is missing owing to its disorder.”®' It is 
such extended, disordered segments that are susceptible 
to endopeptidases when domains are detached by diges- 
tions. The long segment of 60 aa connecting the two 
modular SH2 domains in human protein-tyrosine kinase 
ZAP-70, however, forms a rigid, antiparallel, two- 
stranded coiled coil of o helices.” 

Domains can also be joined by short inflexible 
links, such as the one in dihydrofolate reductase- 
thymidylate synthase (Figure 7-14), the interdomain 
a helix between two spectrin domains (Figure 7-16B), or 
the two connecting, internally repeating, modular EGF 
domains 3, 4, and 5 from murine laminin 1.786 In other 
instances, the segment connecting two domains may be 
structureless, but extensive contacts between the 


domains cause them to be held tightly together (Figure 
7-12). It is in proteins in which the domains were joined 
long ago that such contacts between domains are the 
most extensive. In the chaperone protein PapD from 
E. coli, however, it is the random meander of the link 
between the two domains that forms a hydrophobic core 
gluing together the two domains.” 

As a domain, by definition, is a structure that may 
be now or has been in the past an independent entity, the 
various categories (detachable, enzymatic, coenzymatic, 
functional, recurring, internally repeating, modular, 
structural, independently unfolding, and independently 
folding) are simply different ways of identifying mem- 
bers of a large group of fundamental units of protein 
structure. This group represents all of the smaller units 
from which the larger proteins that now exist were con- 
structed, and the various domains that now exist in any 
one protein were at one time unique, unattached, stable, 
folded polypeptides that were the ancestors of those por- 
tions of the entire polypeptide now containing them. 
These primordial proteins were then internally multi- 
plied or individually fused together during evolution by 
natural selection. 

It is this role as a fundamental unit of evolution 
that lends luster to the title of domain and elicits the 
desire to grant it. But the term domain should remain an 
operational designation, closely tied to the particular evi- 
dence presented in each case. Problems can arise when 
it is applied indiscriminately. In particular, it often hap- 
pens that when the term is used to describe a region of a 
protein for a very specific reason, all of the connotations 
associated with it have a way of attaching themselves to 
that region. For example, a structural domain sublimi- 
nally gains the status of an independently folding 
domain, or an enzymatic domain is assumed to be also a 
detachable domain. Such confusion should be avoided. 
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Problem 7-8: There is a protein in vertebrate liver 
responsible for three enzymatic activities: phosphoribo- 
sylamine-glycine ligase, phosphoribosylglycinamide 
formyltransferase, and phosphoribosylformylglycinami- 
dine cyclo-ligase.*® It is composed of a single polypep- 
tide 1010 aa in length. When the protein was digested 
with chymotrypsin, two products were produced that 
could be separated from each other. They were com- 


posed of polypeptides 450 and 550 aa in length. The 
larger retained the phosphoribosylamine-glycine ligase 
activity; the smaller, the phosphoribosylglycinamide 
formyltransferase activity. The phosphoribosylformyl- 
glycinamidine cyclo-ligase activity was lost. In E coli, the 
phosphoribosylformylglycinamidine cyclo-ligase reac- 
tion is catalyzed by a monofunctional protein composed 
of a polypeptide 330 aa in length. Discuss and explain 
these observations in terms of detachable domains and 
enzymatic domains. 


Problem 7-9: What conclusion concerning pantetheine- 
phosphate adenylyltransferase and dephospho-CoA 
kinase can be drawn from this table?" 


Purification of Pantetheine-Phosphate Adenylyltransferase and Dephospho-CoA Kinase 


from Porcine Liver (600 g) 


purification step total transferase kinase 
protein specific activity specific activity 
(mg) (umol min"! mg’) (umol min"! mg’) 
1700 g supernatant 137,000 0.00020 
protamine sulfate supernatant 54,000 0.00049 
(NH,)>SO, fraction + Sephadex G-25 18,000 0.0013 
DEAE-cellulose 3,200 0.014 0.0067 
procion Red-Sepharose 79 0.54 0.26 
blue Sepharose elution with CoA 4.3 7.4 3.6 
Sephadex G-150 2.1 7.6 3.7 


Problem 7-10: Which of these three 
tertiary structures have structural 
domains? How many are there in 
each?°®°*% These drawings were 
produced with MolScript.*" 
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Molecular Taxonomy 


The proteins observed today have evolved from a much 
smaller group of less elaborate, primordial proteins, just 
as the species of organisms observed today have evolved 
from a much smaller group of less elaborate primordial 
species. The primordial proteins are now represented 
by the domains of presently existing proteins. 
Establishing the evolutionary relationships among these 
species of domains, however, may be far more difficult 
than establishing the evolutionary relationships among 
the species of organisms. Unfortunately, there is no fossil 
record of proteins. It is also quite clear that the evolu- 
tionary divergence that produced most of the proteins 
that are universally distributed among present living 
organisms, for example, the metabolic enzymes, 
occurred before the divergence of the organisms them- 
selves. This follows from the observation that the amino 
acid sequences of the proteins from all living organisms 
responsible for one particular biological function are 
usually able to be aligned or their crystallographic 
molecular models to be superposed, but proteins from 


the same organism responsible for two different func- 
tions are usually difficult if not impossible to relate to 
each other. Thus the lineages of these universally distrib- 
uted proteins have remained almost unbranched since 
the evolution of the earliest organisms, and the radiation 
producing these lineages must have occurred before that 
time. It is also clear, however, from examining amino 
acid sequences and crystallographic molecular models 
that more specialized proteins have been arising contin- 
uously throughout evolution and are still arising today. 
These newer proteins are usually members of classes 
peculiar to a particular kingdom or phylum of organisms, 
and one of the challenges is to identify their ancestral 
relationships to the more universally distributed pro- 
teins. 

It is hoped that, as the number of tertiary structures 
elucidated by crystallography grows, an anatomical col- 
lection of the proteins large enough to form the basis for 
a comprehensive taxonomy can be assembled.’ When 
Linnaeus developed his taxonomic system of the organ- 
isms, it is possible that he was unaware of the reason for 
its existence. It was only when taxonomy was connected 


to the theory of evolution through natural selection that 
an exercise in cataloguing became something more pro- 
found. At the present time, taxonomy in biology is one of 
the methods by which evolutionary relationships are 
established. The desire to establish the evolutionary his- 
tory of the speciation of proteins has led, in an interest- 
ing inversion of history, to the formulation of a 
taxonomic system of the proteins. 

The fundamental unit in a taxonomic system for 
proteins is the domain. The history of most of the pro- 
teins that now exist is that of the random association of 
domains, much as wildly different species are assembled 
into parasitic or symbiotic relationships or into ecosys- 
tems. Consequently, attempting to formulate a taxo- 
nomic system for proteins, just as assembling a 
taxonomic system for ecosystems, would be inappropri- 
ate. It is the domains that are the equivalent of species of 
organisms. An individual domain is a domain of a par- 
ticular amino acid sequence in a particular isoform of a 
particular protein found in a particular species of organ- 
ism, for example, the doubly wound, parallel p sheet in 
isoform A of L-lactate dehydrogenase from S. acanthius 
(Figure 7-11). A species of domains is a population con- 
taining all of the individual domains found in the same 
relative location in the same protein in all of its isoforms 
in all of the species of organisms in which it is found. A 
protein is regarded as the same protein as another pro- 
tein if both of them perform the same function in their 
respective organisms and their two respective amino 
acid sequences can be significantly aligned over their 
entire length or their complete tertiary structures can be 
superposed. The doubly wound, parallel D sheets in all of 
the isoforms of L-lactate dehydrogenases from all of the 
species of organisms constitute a species of domains, as 
do all of the globins from all of the species of organisms 
in which they are found. 

As in populations of organisms, individual domains 
of the same species can differ significantly. To add to the 
confusion, the names of the individual proteins composed 
from domains of the same species can often be different, 
for example, cathepsin K from mammals and papain from 
plants*® or ferredoxin-NADP* reductase from mammals 
and thioredoxin-disulfide reductase from bacteria,” but 
an examination of their respective functions and an align- 
ment of their respective amino acid sequences or a super- 
position of their respective crystallographic molecular 
models establishes they are individuals of the same 
species. Often the structures of individuals of the same 
species of domains, such as the single domains constitut- 
ing the lysozymes from animals and bacteriophage™ or the 
carbonate dehydratases II from animals and bacteria,” 
have drifted apart significantly; but their functions iden- 
tify them. It is the hundreds of thousands of different 
species of domains that are hierarchically classified in the 
taxonomic system. The sequence of the hierarchy for the 
taxonomic system of domains is species, family, super- 
family, common fold, architecture. This can be compared 
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to the sequence of the hierarchy of the taxonomic system 
of the organisms, which is species, genus, family, order, 
class, phylum, kingdom. 

The central concept upon which the present taxo- 
nomic systems?” for classifying domains are based is 
the common fold. Although the exact definitions differ 
among the systems, two or more species of domains 
share a common fold if they have the “same major sec- 
ondary structures in the same arrangement with the 
same topological connections.” Different species of 
domains with the same common fold can have “periph- 
eral elements of secondary structure and turn regions 
that differ in size and conformation.”*” It is within the 
cores of their structures that the common fold exists, and 
loops connecting the elements of the common core can 
differ significantly in their length and structure. For 
example, the motor domains of kinesin and myosin 
share a common fold of an eight-stranded f sheet sand- 
wiched between two sets of three o helices, but the loops 
connecting these elements of secondary structure differ 
dramatically in length. The short loop of five amino acids 
between £ strand 6 and $ strand 7 and the short loop of 
11 amino acids between o helix 4 and a@ helix 5 in kinesin 
are 221 and 142 aa long, respectively, in myosin.*” 
Although such insertions have little influence on the 
selection of the common fold of a domain, they can have 
a significant effect on the function of the protein, for 
example, turning a sulfotransferase into a dehy- 
dratase.*” 

It has been estimated that there are fewer than 1000 
common folds in existence; the estimate varies 
depending on the stringency with which a particular tax- 
onomic system divides the species of domains into 
common folds. The number of common folds and the 
position of the concept of the common fold in the vari- 
ous hierarchies are both quite close to the number and 
position of the class in the taxonomic hierarchy of living 
organisms. Homo sapiens belongs to the class 
Mammalia. The level of the common fold®” is also 
referred to as the topology level,*” the level of the struc- 
ture type,””’ or the level of structurally unique domains?” 
in the different taxonomic systems. 

Particular species of domains can be selected as 
representatives of their common fold (Figure 7-19). 
L-Lactate dehydrogenase domain 1 (Figure 7-19F) repre- 
sents the common fold of doubly wound, parallel 
Bsheets containing six Bstrands in the order 321456 
(Figure 7-20).?'' Five representatives from the large set of 
domains of this common fold?” are listed within the box 
in Figure 7-20. Domains of the same common fold are 
often found in proteins with significantly different func- 
tions. The catalytic domain of aspartate-tRNA ligase and 
the domain constituting an entire molecule of 
asparagine synthase are of the same common fold,*”’ as 
well as the domains constituting the entire molecules of 
tumor necrosis factor and the coat protein of satellite 
tobacco necrosis vis "7" 
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Figure 7-19: A menagerie of representative tertiary structures from eight common folds of domains.”"! 
Each of these cartoons has been drawn from the respective crystallographic molecular model. The flat 
arrows represent strands of D structure; and the helical ribbons, o helices. (A) Myohemerythrin is an exam- 
ple of an up-down-up-down, antiparallel «-helical bundle. (B) The $ subunit of hemoglobin is an exam- 
ple of a Greek key, antiparallel o-helical bundle. This can be seen if, from the amino terminus (indicated 
by the arrow), the first long, bent œ helix, the second short æ helix, and the next four long ahelices are 
numbered 1 through 6, respectively, and it is assumed that o helix 1 has drifted 90° away from being par- 
allel to a helix 6. (C) Domain 2 of papain is an example of an up-down-up-down £ barrel if the last two 
short and bent £$ strands are ignored. (D) Domain 2 of pyruvate kinase is an example of a Greek key, 
antiparallel $ barrel if the short strand of ß structure, p strand 3, between £ strands 2 and 4 is ignored. The 
six strands of the Greek key are numbered as in Figure 7-21. (E) Domain 3 from tomato bushy stunt virus 
is an example of a jelly roll, antiparallel 8 barrel if the first two, amino-terminal f strands are ignored. The 
eight strands of the jelly roll are numbered as in Figure 7-21. (F) Domain 1 of L-lactate dehydrogenase is 
an example of a doubly wound, parallel p sheet. (G) Triose-phosphate isomerase is an example of an 
a-helically wound, parallel $ barrel. (H) Domain 3 of glutathione-disulfide reductase is an example of an 
open-faced ß sandwich. Adapted with permission from ref 211. Copyright 1981 Academic Press. 


Figure 7-20: Topological representa- 
tions of several members of the architec- 


ture of doubly a-helically wound, 
parallel 8 sheets.” The £ sheets seen in 
each crystallographic molecular model 
were flattened upon a plane. The order d1 lactate dehydrogenase, d2 alcohol dehydrogenase, d2 phosphoglycerate kinase 


in which the strands of the sheet d1 asubunit of succinate-CoA ligase, d2 phosphorylase 


occurred in the amino acid sequence of 
the polypeptide was noted as well as 
whether the connecting loops, usually 


ahelices, were above or below the 
plane. The dark arrows represent each 
Bstrand in the order in which they 


appear across the sheet. The open lines 


represent connections above the plane, d1 glyceraldehyde 3-phosphate flavodoxin subtilisin 


and the thin lines represent connections dehydrogenase 
below the plane. Within the box, the pat- 
tern of secondary structures displayed 
by domain 1 of L-lactate dehydrogenase 


d1 arabinose-binding 
protein 


(Figure 7-11), domain 2 of alcohol dehy- 
drogenase, domain 2 of phosphoglycer- 
ate kinase, domain 1 of the a subunit of 


succinate-CoA ligase (ADP-forming), 
and domain 2 of phosphorylase (Figure 
7-11A) represents the common fold to 
which these five domains belong. 
Reprinted with permission from ref 211. 


dihydrofolate adenylate d1 thiosulfate 
reductase kinase sulfurtransferase 


d2 glutathione-disulfide 
reductase 


Copyright 1981 Academic Press. HNO Tun AA 


phosphoglycerate d3 pyruvate d1 hexokinase 


mutase kinase 


d2 hexokinase 
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The next higher level in the taxonomic hierarchy of 
domains is that of architecture.*” A set of different 
common folds with the same architecture have the same 
clearly related spatial arrangement of secondary struc- 
tures even though they differ in the number of individual 
elements of secondary structure or differ by one or more 
adjacent interchanges in the order in which those ele- 
ments of secondary structure are juxtaposed or in both 
their number and their order. A collection of particular 
species of domains, each with a different common fold, 
can represent the architecture of doubly wound, parallel 
Bsheets (Figure 7-20). This particular architecture,’ 
however, has become so diverse!" that its systematic 
reorganization into several different architectures is 
probably required. A systematic census of the members 
of the architecture of open-faced p sheets (Figure 7-19H) 
has been taken.**! 

Whether or not there is a higher level in the hierar- 
chy than architecture is unresolved. For convenience, 
groupings have been used in which domains are sorted 
on the basis of whether they are formed entirely from 
æ helices, formed from chelices alternating with 
p strands, formed from segregated o helices and ß sheets, 
or formed entirely of $ structure.’ These groups may 
have no evolutionary significance. For example, a-heli- 
cally wound, parallel D barrels (Figure 7-19G) would be in 
the alternating of group, but glucan 1,4-a-glucosidase 
from Aspergillus awamori is an «-helically wound, paral- 
lel o barrel’ that could be more closely related to @-heli- 
cally wound, parallel Bbarrels than to any entirely 
a-helical domain. If so, this would demonstrate that 
Bstrands can become g helices, a possibility for which 
there is evidence on the much smaller scale of single ele- 
ments of secondary structure.”!** If such a transforma- 
tion turns out to be common, higher groups based on 
topological arrangements rather than type of secondary 
structure might turn out to be more appropriate. 

The level in the taxonomic hierarchy of domains 
above that of species is that of family. The central crite- 
rion on which the level of family is usually based in the 
classification of domains is that of the function and the 
structure of the protein containing the domain. A family 
of domains is a set containing all of the domains of the 
same common fold that are found at the same position in 
the complete set of those proteins that have coincident 
structures and that perform related functions. Two pro- 
teins have coincident structures when they are both 
composed of the same number of domains, and the 
domains found at the same respective positions in the 
two proteins have the same common fold. The crystallo- 
graphic molecular models of proteins with coincident 
structures are superposable, domain by domain, over 
their entire length. Each of the consecutive folds can be, 
but is not necessarily, different. For example, the first and 
second domains in benzoylformate decarboxylase and 
the first and second domains in pyruvate decarboxylase 
all share the same common fold, but the third domains 


from these two proteins with coincident structures share 
a different common fold.”°?*° 

Examples of domains in the same family illustrate 
the classification. The domains constituting the entire 
molecules of the hydrolases carboxymethylenebutenoli- 
dase, alkylhalidase, and carboxypeptidase D*™ all belong 
to the same family. The single domains constituting the 
entire molecules of the mammalian endopeptidases 
factor D and trypsin are in the same family, as are 
those constituting the entire molecules of adenosine 
kinase and ribokinase.°” Phosphoglycerate dehydroge- 
nase, L-2-hydroxyisocaproate dehydrogenase, D-lactate 
dehydrogenase, erythronate-4-phosphate dehydroge- 
nase, and glycerate dehydrogenase all have coincident 
structures and catalyze similar reactions.” All of their 
domains 1 have one common fold and belong to one 
family, and all of their domains 2, which have a different 
common fold from that of the domains 1, belong to 
another family. The corresponding domains from cyclin- 
dependent protein kinase 2, MAP protein kinase ERK2, 
and cyclic-AMP dependent protein kinase are in the 
same respective families,” as are the corresponding 
domains from aspartate-semialdehyde dehydrogenase 
and glyceraldehyde-3-phosphate dehydrogenase.” 

The elaborations of the common fold of the 
domains within the same family can be dramatic. For 
example, domains 1 of tyrosine phenol-lyase, cystathio- 
nine f-lyase, ornithine decarboxylase, aspartate 
transaminase, phosphoserine transaminase, and adeno- 
sylmethionine-8-amino-7-oxononanate transaminase 
are members of the same family because they share the 
same common fold in which five D strands form the core, 
their pyridoxal phosphates are located in the same posi- 
tions relative to the core, and the reactions they catalyze 
are of the same type. The elements of the common fold 
beyond the core, however, have drifted so far apart that 
they cannot be superposed, and there are a number of 
additional peripheral elements of secondary structure 
found in some species in the family but not in other 
species.” This example illustrates the ambiguity of the 
upper limit for the level of family in the hierarchy. 

Between the level of a family of domains, which is 
anchored in the coincident structures of the proteins 
containing the domains and their related functions, and 
the level of the common fold, which is anchored in the 
topological identity of their structures, is a region in the 
taxonomic hierarchy in which there are, at the moment, 
no consistent rules. This region is vaguely referred to as 
the level of the superfamily. Domains are grouped in a 
superfamily if the proteins that contain them have coin- 
cident structures but significantly different functions. 
For example, thiamine pyridinylase and maltose-binding 
protein have coincident structures but significantly dif- 
ferent functions.” Although all of the members of the 
enolase superfamily do share the function of abstracting 
a proton from a carbon o to a carboxylate, the superfam- 
ily has been divided into three families.'’**°° Domains 


are also grouped in a superfamily if the proteins that con- 
tain them have only partially coincident structures. The 
domains 2 and 3, respectively, from biotin carboxylase, 
phosphoribosylamine-glycine ligase, synapsin Ia, D-ala- 
nine-D-alanine ligase, and glutathione synthase have the 
same common folds, and all of the domains 1 have the 
same architecture,” *”' but the five proteins do not have 
coincident structures. Whether their domains 2 and 3 are 
of the same respective superfamilies or only of the same 
respective common folds is a question that illustrates the 
ambiguity of the upper limit of the level of superfamily in 
the hierarchy. 

There are hundreds of common folds of domains, 
so a comprehensive discussion of even the architectures 
into which these common folds are arranged is not pos- 
sible. There are some common folds and architectures, 
however, that have regular structural patterns that stand 
out. Both the Bhelix (Figure 6-12) and the B propeller 
(Figure 6-13) define architectures of domains. One archi- 
tecture contains right-handed £ helices; another, left- 
handed helices.’ Within the architecture of 
Bpropellers, the number of blades determines the 
common fold to which a particular member 
belongs.” 

A parallel Bbarrel (Figure 6-11) is usually wound 
completely by o helices (Problem 7-10B), as is the paral- 
lel 6 barrel constituting the entire folded polypeptide of 
triose-phosphate isomerase (Figure 7-19G). An œ helix 
connects each stave of the barrel to the next. The number 
of staves in such a regularly «-helically wound, parallel 
Bbarrel determines the common fold to which it 
belongs. The common fold with the largest population is 
that in which the barrel has eight $ strands. There are a 
large number of enzymes each of whose entire struc- 
ture?” or the majority of each of whose structure*” is an 
a-helically wound, parallel ßbarrel of eight strands. 
There are often additional elements of secondary struc- 
ture found in the loops connecting an œ helix of the 
winding to a ß strand in an a-helically wound, parallel 
B barrel,*****’ but these are found at the periphery of the 
structure and do not disrupt the common fold. 
a-Helically wound, parallel $ barrels can stand alone or 
have other domains attached to them as in pyruvate 
kinase, phosphopyruvate hydratase,” and the R1 pro- 
tein of ribonucleoside-diphosphate reductase.*”’ 

In a number of the common folds of domains 
(Figure 7-19A-E), the structures observed seem to arise 
from a reasonable topological operation (Figure 7-21)*" 
that could explain their creation. Consider a polar curve 
that doubles back upon itself to form a hairpin. Twist the 
hairpin thus formed so that it folds into two turns of a 
right-handed superhelix (Figure 7-21A). Compress this 
superhelix until its neighboring segments both in front 
and behind come in contact and then incorporate the 
segments into the surface of a flattened cylinder (Figure 
7-21B). This produces a flattened barrel with eight staves 
the polarities of which alternate as one proceeds around 
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the structure. This flattened cylinder can be rolled as the 
tread on a caterpillar tractor to produce eight different 
barrels that resemble each other but that vary in the jux- 
tapositions of the staves across the center. The connec- 
tions between the segments, which define the 
topological relations of the curve, remain unaltered 
during such rolling. 

If the flattened cylinder in any of its guises is cut 
between the first and second segments in the hairpin 
(segments 1 and 2 in Figure 7-21) and spread upon a 
plane, a jelly roll’™ (Figure 7-21C) is produced. Shorten 
the hairpin by removing the two most peripheral seg- 
ments (cuts @ in Figure 7-21 to remove segments 1 and 
8). A new flattened barrel is created (dotted lines in 
Figure 7-21B) with six staves that alternate in polarity. If 
this smaller flattened cylinder is cut between the first and 
last segments in the hairpin (segments 2 and 7 in Figure 
7-21) and spread upon a plane, a Greek ke"! (Figure 
7-21D) is produced. Shorten the hairpin by removing the 
two most peripheral segments (cuts @ in Figure 7-21 to 
remove segments 2 and 7). A new flattened barrel is cre- 
ated with four staves that alternate in polarity. If this 
cylinder is cut between the first and last segments in the 
hairpin (segments 3 and 6 in Figure 7-21) and flattened 
upon a plane, an up-down-up-down””’ pattern is pro- 
duced. 

The polar curve in the topological exercise can be 
substituted with a polypeptide either as a strand of 
p structure or in an o helix, and the staves of the flattened 
barrel will be either strands of ß structure or æ helices, 
respectively. If they are strands of structure, they are 
gathered as systematically antiparallel (Figure 7-21B) 
pleated sheets (Figure 4-16C) into an antiparallel B bar- 
rel (Figures 7-12 and 7-13). The jelly roll is represented 
by the cohesin domain (Figure 6-21)" and domain 3 of 
the coat protein of tomato bushy stunt virus (Figure 
7-19E). There are a large number of viral coat proteins 
each containing a domain of this class.”” There are elab- 
orations on this topological arrangement; for example, 
each of the spermadhesins from seminal fluid is com- 
posed of a single domain that represents a class of 
domains in which two consecutive antiparallel 6 strands 
are added to a jelly roll (Figure 7-21C) at the amino-ter- 
minal end of the polypeptide (strands 0 and -1). These 
additional $ strands are inserted into the barrel (Figure 
7-21B) between staves 1 and 2.404! 

The Greek key is represented by domain 2 of 
pyruvate kinase (Figure 7-19D). There are also elabora- 
tions on this theme. The members of the class of 
immunoglobulin modular domains (Figure 7-13 and 
Table 7-7) are B Greek keys (Figure 7-21D) in which an 
antiparallel Bstrand is added at the carboxy-terminal 
end of the polypeptide (strand 8), but unlike the strand 8 
in a jelly roll, which would be located between strands 2 
and 3 of the barrel if strand 1 were deleted (Figure 7-21B), 
the strand 8 in the immunoglobulin class is found 
between strands 2 and 7 of the barrel.”°“”’“” The class of 
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Bup-down-up-down domains is 
domain 2 of papain (Figure 7-19C). 

Not all of the classes of domains in which the 
polypeptide forms an antiparallel p barrel have topolog- 
ical arrangements derived from a superhelical hairpin. 
For example, in the £ barrel of licheninase, the structure 
is so far removed from a compressed superhelical hair- 
pin that over all of the possible combinations of its 14 
strands, there are only two turns of superhelical hairpin, 
one turn involving strands 9 through 12 and one turn 
involving strands 1 through 4.*0° 

The antiparallel o Greek key is represented by the 
Bsubunit of hemoglobin in Figure 7-19B, but a much 
more regular representative of this class in which all six 
a-helical staves of the barrel are aligned more regularly is 
found in the caspase recruitment domain of the caspase 
activator Apafl.'” The class of œ up-down-up-down 
domains is cleanly represented by myohemerythrin 
(Figure 7-19A). 


represented by 


Figure 7-21: Topological explanation of how the patterns of 
the folded polypeptides defining six of the common folds of 
domains could have arisen by a common mechanism.?'' (A) A 
long hairpin of £ structure or o helix is twisted into a superheli- 
cal coil. (B) That superhelical coil is flattened into a barrel. 
(C) The order in which the strands occur around the flattened 
barrel (1, 8, 3, 6, 5, 4, 7, 2) is that of a jelly roll. (D) If strands 1 
and 8 are cut away (cuts @), or were never there to begin with, 
the order in which the strands occur around the smaller flat- 
tened barrel (2, 3, 6, 5, 4, 7) is that of a Greek key. (E) If strands 
1 and 2 and strands 7 and 8 are cut away (cuts @), or were never 
there to begin with, the order in which the strands occur around 
the smaller flattened barrel (6, 5, 4, 3) is up-down-up-down. 


When these various representatives (Figure 7-19 
A-E) are examined closely, it is obvious that if the topo- 
logical scheme displayed in Figure 7-21 was the initial 
mechanism of folding, individual staves of the barrels 
have drifted significantly from their original positions (as 
do the helices in Figure 7-10), and more recent second- 
ary structures have arisen at the ends of the barrels and 
in the loops connecting the staves. 

There are two stereochemical properties producing 
alternative topological arrangements of the helical hair- 
pin (Figure 7-21A) that generates the barrel (Figure 
7-21B). Because the polypeptide is polar, amino termi- 
nus to carboxy terminus, there are two distinct polarities 
to the hairpin, the one shown by the arrowheads in 
Figure 7-21A and the one in which the polypeptide runs 
in the opposite direction. For example, in the a Greek 
keys of both the p subunit of hemoglobin and the cas- 
pase recruitment domain of the caspase activator 
Apaf-1, the polypeptide runs through the barrels with a 


polarity opposite to that shown in Figure 7-21B. Because 
the barrel is generated by a helical conformation of the 
hairpin, there are two possible twists to the helix, the 
right-handed one shown in Figure 7-21A and the left- 
handed one. For example, the six-stranded ß barrel 
forming the common fold in the family of domains con- 
taining growth hormones, interleukins, and granulocyte- 
colony-stimulating factor is a ß Greek key in which the 
superhelix is left-handed.“ Because two polarities are 
possible and two twists are possible, there are four dis- 
tinct geometries for the jelly roll, four for the Greek key, 
and two for the up-down-up-down conformation. 

It is possible that the regular structures such as the 
p helix, the f propeller, the jelly roll, and the Greek key 
represent evolutionarily efficient topological solutions 
to the problem of folding a polypeptide. Within an archi- 
tecture of domains or even among the members of the 
same common fold, a good deal of variation is observed; 
either individual elements of secondary structure in the 
common topological pattern do not superpose very well 
or extensive peripheral elements of secondary structure 
are found in one member of a class that are not found in 
others. If these variations represent the drift in the loca- 
tions of the elements of secondary structure from those 
they had in the common ancestor or the insertion of ele- 
ments following the divergence from a common ances- 
tor, they reflect the degree to which these domains are 
evolutionarily related to each other, and the taxonomic 
system for domains parallels the taxonomic system for 
species. If many of the variations observed, however, 
state that the domains being compared differ so dramat- 
ically because they do not share a common ancestor, 
they reflect the fact that the structure is a particularly 
favorable solution to folding a polypeptide that has been 
exploited many times by convergent evolution and the 
taxonomic systems for domains overstate their phyloge- 
netic information. 
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Problem 7-11: Construct a phylogenetic tree from 
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Chapter 8 
Counting Polypeptides 


Almost all of the proteins found in a living organism are 
multimeric proteins. A multimeric protein is a protein 
containing more than one folded polypeptide. Each of 
these folded polypeptides was originally synthesized by a 
ribosome from messenger RNA that encoded a sequence 
of amino acids of a precise, finite length. These polypep- 
tides folded into defined conformations and were post- 
translationally modified. Each of the folded, posttrans- 
lationally modified polypeptides in a multimeric protein 
is one of its subunits. Usually, only a specific and well- 
defined number of these subunits are gathered together 
to form the macromolecular complex that is the finished, 
existing molecule of the multimeric protein. An 
oligomeric protein is a multimeric protein with a fixed, 
invariant number of subunits. In a few instances, such as 
the proteins actin, keratin, and collagen, a large and 
undefined number of subunits combine to form a poly- 
meric protein, which in theory could continue to poly- 
merize indefinitely. A polymeric protein is a protein with 
many subunits, the number of which varies from mole- 
cule to molecule of that protein. Polymeric proteins are 
the exception; most proteins are oligomeric. 

The stoichiometry of the subunits of a protein is 
the number of each type of folded, posttranslationally 
modified polypeptide that are combined to produce the 
specific structure. At this level of definition, each of the 
polypeptides is identified only by its length. The length 
of a polypeptide is the number of amino acids it con- 
tains, n,,, an integer that is either dimensionless or has 
the units of amino acids (molecule of polypeptide)’ or 
moles of amino acids (mole of polypeptide)". The length 
of a polypeptide is usually a precisely known quantity 
because its amino acid sequence is usually available and 
any posttranslational modifications have usually been 
defined. 

A great deal of effort has been expended in discov- 
ering the stoichiometries of the subunits of proteins. The 
original approach to this information was to determine 
the molar mass of the intact protein, to separate the indi- 
vidual polypeptides composing the protein, to quantify 
the mass ratio among the various polypeptides, and to 
determine the molar mass of each of the separated 
polypeptides. The measurement of the molar masses of 
intact proteins was at one time a major area of biophysi- 
cal research, but this pursuit presently attracts much less 
attention. 

The individual subunits composing an intact native 


protein are separated and catalogued analytically by 
electrophoresis on polyacrylamide gels cast in solutions 
of the detergent sodium dodecyl sulfate. The separation 
that is effected by these polyacrylamide gels relies on 
their ability to sieve the unfolded polypeptides. The con- 
stituent polypeptides of a protein are separated prepara- 
tively by chromatography that depends either on sieving 
or on ion exchange of the unfolded, dissociated poly- 
mers. The separated polypeptides are shown to be 
homogeneous and unique by peptide mapping. 

The major weaknesses of the original approach to 
defining the stoichiometry of the subunits of a protein 
were the extreme care with which the initial measure- 
ments of the molar mass of the intact protein had to be 
performed and the unreliability of the assessments of the 
mass ratios among the constituent polypeptides and of 
their molar masses. The present approach to defining the 
stoichiometry of the subunits of a protein avoids these 
problems. The individual polypeptides composing a pro- 
tein are still separated and catalogued by electrophoresis 
and shown to be unique and homogeneous by peptide 
mapping. The length of each of the constituent polypep- 
tides is assessed either by the electrophoresis itself or, 
preferably, by sequencing the appropriate cDNA. Any 
glycosylation is quantified analytically. The number of 
each polypeptide composing the intact protein is deter- 
mined by covalently cross-linking the protein to various 
degrees of completion and identifying the various inter- 
mediate covalent complexes and the limit complex. 

The different subunits in an oligomeric protein are 
defined by their lengths and distinguished by assigning 
them consecutive letters of the Greek alphabet. For 
example, deoxyhemoglobin at its normal concentrations 
is constructed from two subunits, o and D each present 
in two copies to produce the complex (oft, Nicotinic 
acetylcholine receptor has the composition œßyð; 
L-lactate dehydrogenase, (,),; DNA-directed RNA poly- 
merase from Escherichia coli, &,ßyö; and 2-dehydro- 
3-deoxy-phosphogluconate aldolase, a. The grouping of 
subunits into subsets, for example, the groups of two 
subunits in L-lactate dehydrogenase, arises from the 
symmetries in which the subunits are arranged within 
the intact molecule of an oligomeric protein. 

Some oligomeric proteins, when they are dissolved 
at certain concentrations, are mixtures of two different 
combinations of subunits in equilibrium with each 
other. For example, oxygenated hemoglobin is an equi- 
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librium mixture of af dimers and (aß), tetramers. Most 
oligomeric proteins, however, have a particular compo- 
sition of subunits that does not vary unless harsh condi- 
tions are applied. A solution of a pure oligomeric protein 
will usually be monodisperse. A monodisperse solution 
of a particular macromolecule is a solution in which 
every one of those macromolecules is of the same size 
and shape, and each is compact and unique and remains 
dissolved and unassociated with its neighbors. 

When the map of electron density from a crystal of 
an oligomeric protein is examined, the complete mole- 
cule can be discerned. It is recognized as a large, inde- 
pendent feature in the map that is formed from several 
folded tubes of electron density. Although not always 
essential, it is reassuring to know before the map is exam- 
ined how many subunits are combined to produce the 
protein that has been crystallized. This determination 
can be made by a combination of sequencing, molecular 
sieving, and cross-linking. It is now so routine to do this 
that few oligomeric proteins the subunit stoichiometry of 
which have not already been established are examined 
crystallographically. 


Molar Mass”? 


The only indubitably reliable method for determining 
the length and amino acid composition of a polypeptide, 
and consequently, its precise molar mass, is to sequence 
it correctly. At the moment, the sequences ofa large array 
of readily available polypeptides are known, and they 
form a collection of standards each of the lengths of 
which, Maa is an exact quantity. It is often, but not always, 
the case that the amino acid sequence of a newly purified 
polypeptide is known from the nucleotide sequence of its 
cDNA before enough of it becomes available to study its 
physical properties in detail. This situation has inverted 
the classical strategy of physical measurements in which 
the molar mass of a protein was one of the ultimate dis- 
coveries rather than something precisely known from the 
beginning. Unfortunately, however, the extensive effort 
expended in determinations of molar mass, although 
expended decades ago, still influences the importance 
attached to molar mass. 

The molar mass of a protein, Mprov is the number of 
grams in a mole of that protein. The molecular mass of a 
protein is the mass of a single molecule of that protein 
expressed in relative units that are referred to as either 
atomic mass units (amu) or daltons (Da). Both an atomic 
mass unit and a dalton (1.6606 x 10°” g) are jus the mass 
of carbon isotope 12. Because Avogadro’s number is the 
number of carbon atoms of isotope 12 in 12 g of carbon 
atoms of isotope 12, the numerical values of molar mass 
and molecular mass for the same molecule are the same, 
but not the units attached to the numbers. Molar mass is 
the quantity that is determined by the measurements of 
physical behavior such as osmotic pressure, sedimenta- 


tion equilibrium, and light scattering because the units 
on the final quantity are usually grams mole”. 

Because both sedimentation equilibrium and light 
scattering are alternative measurements of osmotic pres- 
sure, all three techniques determine the same funda- 
mental physical property of a protein in a solution, 
namely, its chemical potential. Osmotic pressure is a col- 
ligative property of the solution. A colligative property of 
a solute is a physical property that is a function only of 
the moles of independent particles of that solute in a 
standard volume of the solution. If the solution were 
monodisperse, if the only osmotically active particles 
present were individual molecules of the protein, and if 
the concentration of the protein in grams centimeter” 
were known precisely, the measured molar concentra- 
tion could be used to calculate the molar mass of the pro- 
tein. 

Osmotic pressure is the pressure exerted by imper- 
meant solutes when a solution containing those imper- 
meant solutes is separated by a semipermeable 
membrane from another solution identical in every way 
to the first except that it lacks the impermeant solutes. A 
solute is impermeant to a particular barrier if it cannot 
pass through that barrier. A semipermeable membrane 
is a membrane through which all of the components of 
the two solutions can pass freely except the impermeant 
solutes. The chemical and physical properties of the 
membrane define which solutes are permeant and which 
are impermeant, which are osmotically silent and which 
are osmotically active, respectively. In the case of exper- 
imental measurements of the osmotic pressure of solu- 
tions of proteins, a membrane is chosen that is porous to 
the small molecules in the solution but the pores of 
which are too small to pass the molecules of protein. This 
is usually a sheet formed from a polymeric material that 
has spaces between the strands of polymer wide enough 
to pass small molecules and ions but too narrow to pass 
macromolecules. 

Pressure is the force that results from the tendency 
of molecules to expand the confines in which they are 
contained and fill a larger volume. The pressure they 
exert is a direct measurement of the chemical potential 
of a population of molecules. When molecules such as 
impermeant solutes exert an osmotic pressure, they 
cannot expand their confines, as can gases, by entering 
the vacant space above the solution because they are 
held by intermolecular forces within a condensed phase. 
The volume of solution in which they are confined, how- 
ever, can expand if solution from the other side passes 
across the semipermeable membrane into the solution 
containing the impermeant solutes. This is the liquid 
analogy to a balloon expanding to fill more space. Just as 
the balloon expands until the external pressure is equiv- 
alent to the internal pressure of the trapped gas, the solu- 
tion containing the impermeant molecules expands until 
an external pressure equal to its osmotic pressure is 
applied to it. Operationally, the osmotic pressure of a 


solution is the external pressure that must be applied to 
the solution containing the impermeant solute to pre- 
vent any expansion in its volume by the net movement of 
fluid through the semipermeable membrane. In an 
apparatus for measuring osmotic pressure, the differ- 
ence in pressure between the chamber containing the 
protein and the chamber lacking the protein can be 
maintained and continuously monitored with a pressure 
transducer.’ 

It can be shown? that the osmotic pressure exerted 
by a solution of impermeant solutes is formally equivalent 
to the pressure exerted on the walls of a container filled 
with gases that are impermeant to those walls. The mole- 
cules of the impermeant solute are formally equivalent to 
the molecules of the gas, and those of the solvent and per- 
meant solutes are as physically silent as the vacuum in 
which the gas is suspended. The ideal gas law is 


(8-1) 


where P is the pressure (newtons centimeter’) exerted 
by the gas, R is the gas constant (831.5 N cm K! mol), T 
is the temperature (kelvins), nm is the number of moles of 
the gas, and Vis the volume (centimeters?) in which it is 
confined. The pressure exerted by a real, nonideal gas, 
however, is 


IS, dal, dl 


where n,,V is the concentration (moles centimeter’) of 
the gas and the coefficients B, C, and so forth are referred 
to, respectively, as the second virial coefficient, the third 
virial coefficient, and so forth. The virial coefficients 
provide the necessary corrections for the behavior of the 
nonideal gas due to the specific properties that make it 
nonideal, such as the finite dimensions of the molecules 
that fill, or exclude, some of the volume of the container, 
the intermolecular forces between the molecules of the 
gas, and the tendency of molecules of the gas to dimerize 
or polymerize. For all of the same reasons,’ the osmotic 
pressure II (newtons centimeter”), exerted by a non- 
ideal, impermeant solute S is 


I = RT([S] + B[S]? + c[S]’...) (8-3) 


where [S] is the molar concentration of the solute. If the 
impermeant solute S is protein i 


lim II = RT [protein i] (8-4) 


[protein i] — 0 


Equation 8-4 states that, at low enough concentrations of 
protein, the osmotic pressure observed will be directly 
proportional to the molar concentration of the protein. 
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The molar concentration of the protein in the solu- 
tion cannot be known if the molar mass of the protein is 
unknown, but the concentration of the protein in the 
solution must be known in some type of units. From the 
ultraviolet spectrum of the solution, the concentration 
(moles centimeter™) of the tryptophan and tyrosine in 
the protein might be known.’ From a colorimetric assay, 
the concentration (moles centimeter™) of peptide bonds 
in the solution might be known.’ From total amino acid 
analysis,’ the concentration (moles centimeter?) of 
amino acids in the solution might be known. From a dry 
weight measurement of the protein, the grams of dry 
weight of protein in a milliliter of solution (grams cen- 
timeter”) might be known. Regardless of the units, the 
value for the concentration of protein in the units of the 
quantity measured can be designated as Cprot (units cen- 
timeter’’). It follows that 


W [protein i] = Cprot (8-5) 


where Wis a constant of proportionality. 

The units on Ware moles of tryptophan and tyrosine 
(mole of protein)", moles of peptide bonds (mole of pro- 
tein), moles of amino acids (mole of protein)”, or grams 
of dry weight (mole of protein)", respectively. It is only 
coincidental that the last is usually chosen. It is this exer- 
cise that defines what a colligative property is and illus- 
trates that osmotic pressure does not measure molar mass 
directly. The final result can be always traced back to an 
independent measurement ofthe concentration ofthe pro- 
tein. When Equation 8-5 is incorporated into Equation 8-4 


. II RT 
lim = — 


8-6 
Corot > H C prot w 


and the intercept of IT/Cprot aS Cprot is decreased to 0 pro- 
vides the value of W (Figure 8-1).°* Ifthe units ofthe con- 
centration Cprot were grams of protein centimeter? and 
the only osmotically active species present were mole- 
cules of the protein of interest in a monodisperse solu- 
tion, W would be the molar mass of the protein in grams 
mole". 

The complete equation describing the actual 
behavior of the osmotic pressure at low concentrations 
of protein is 


C prot 


I] = RT 


+ BC wert + CCom +++} (8-7) 


where B, C, ... are the virial coefficients expressed in 
appropriate units. In Figure 8-1, the chosen concentra- 


*The units of concentration used by the authors for the data in 
Figures 8-1, 8-2, and 8-5 were grams centimeter’. The established 
symbol for concentration in units of mass volume" is y. 
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tions of the protein, bovine serum albumin, are in the 
range where only the first virial coefficient, B, is signifi- 
cant, and the results are presented as if the equation were 

II 1 
W 


2 rr 4 BC pet (8-8) 


C 


prot 


It can be seen that when the charge on the protein was 
changed by changing the pH, the value of the second 
virial coefficient, B, and hence the slope of the line, 
changed in response to changes in the various parame- 
ters of the solution, but in agreement with Equation 8-6, 
the intercept seems to have remained the same. 

The fact that the slopes of the two lines in Figure 
8-1, and hence the two values of the second virial coeffi- 
cient, are different arises from the Donnan effect. 
Because a molecule of protein bears a net charge at a 
given pH and because the two solutions on either side of 
the semipermeable membrane must be electroneutral, 
the concentrations of electrolytes cannot be the same in 


II / Yprot (N cm EW 


0.02 0.04 
Y prot (g cm?) 


Figure 8-1: Osmotic pressure of solutions of bovine serum albu- 
min.’ The incremental osmotic pressure (II/ Wrot» Where the 
osmotic pressure II is in newtons centimeter” and the concentra- 
tion of protein et is in grams of protein centimeter”, is plotted as 
a function of eet. The apparatus was at 25 °C, and within the appa- 
ratus the solution of protein was separated from the solution lack- 
ing the protein by a membrane of nitrocellulose polymer. The 
osmotic pressure of the solution was the external pressure that had 
to be applied to the solution of protein, in excess of the atmos- 
pheric pressure on the solution lacking the protein, to prevent its 
expansion. The excess pressure was measured with a toluene 
manometer, and millimeters of toluene was converted to newtons 
centimeter~. The concentration of protein, following the measure- 
ment, in each of the solutions containing protein was determined 
by a dry weight analysis that was corrected for the weight of other 
solutes present. The solutions contained 0.15 M NaCl as support- 
ing electrolyte. The behaviors of the osmotic pressure at pH 7.0 (O) 
and pH 5.37 (@) are shown. Adapted with permission from ref 8. 
Copyright 1946 American Chemical Society. 


the two solutions. Suppose that the only electrolyte in the 
solution is KCl, the solution in the compartment con- 
taining the protein is designated a, and the solution in 
the other compartment is designated D To preserve elec- 
troneutrality 


[k*], = [Cr], (8-9) 
and 


Z, [protein i] + [K*], = [CI], (8-10) 


where Z; is the mean net charge number on protein i. If 
activity coefficients are ignored for the moment, 


Uga = Uka + RTIn [K*] [CV] (8-11) 


where uka is the chemical potential of the KCl. The 
chemical potential of the KCl must be the same on both 
sides of the membrane, and u°kcp the standard chemical 
potential of KCl, is always the same, so 


[k*], [cr], = IK], [CH], (6-12) 
at equilibrium. Combining these equations 
Z, [protein i] \” 
[K*], = eil + —— Ip 5 | (8-13) 
IC, 
Z, [protein i] \” 
CT]; = [C5] | 1 - = 8-14 
erh erh. =F) an 


Equations 8-13 and 8-14 state that if the protein is posi- 
tively charged, there will be more chloride and less potas- 
sium in compartment o than in compartment p, or that 
if the protein is negatively charged, there will be more 
potassium and less chloride in compartment o than in 
compartment ß. These imbalances of molecular species 
will affect the osmotic pressure because the electrolytes 
cannot distribute freely across the semipermeable mem- 
brane and consequently they become osmotically active. 
Although there is no exact solution to these equations, at 
low concentrations of protein and at small values of Z? 


Z? [protein i] 
4[ KCl] 
(8-15) 


lim TI = RT [protein i]| 1 + 


[protein i] — 0 


For molecules with large values of Z;, such as nucleic 
acids, this approximation fails badly. 

From Equation 8-15 it can be seen that the Donnan 
effect is expressed in the second virial coefficient, B, as 
demonstrated in Figure 8-1; but when [KCl] > 5Z;[pro- 


tein i], the Donnan effect has less than a 5% effect on the 
osmotic pressure and as [protein i] approaches 0, the 
Donnan effect also approaches 0. For the measurements 
presented in Figures 8-2 and 8-5, but not those in 
Figure 8-1, the addition of 0.1 M KCl would satisfy this 
inequality. 

Another way to consider the imbalances of coun- 
terions created by the Donnan effect is to assume that the 
charge on the impermeant protein creates an electrical 
potential and the permeant ions redistribute in response 
to this potential. This Donnan potential complicates 
measurements performed by sedimentation equilibrium 
and light scattering as well as osmotic pressure. In the 
absence of added electrolytes, gradients or discontinu- 
ities of electrical potential would form during all three of 
these procedures as the concentration of the protein 
varies within the chambers of the apparatuses. These gra- 
dients or discontinuities of electrical potential, because 
they arise from the Donnan effect, can be eliminated in 
the case of most proteins by adding a simple electrolyte 
such as KCI to the solution to a concentration of around 
0.1 M. One way to understand the effect of adding salt is 
to imagine that the increase in ionic strength decreases 
significantly the thickness of the ionic double layer 
(Equations 1-71 and 1-77) so that it encloses the mole- 
cule of protein tightly, effectively neutralizes its charge, 
and turns it into an apparently neutral macromolecule. 
The added electrolyte also eliminates any local gradients 
of electrical potential caused either by separating two 
phases by a semipermeable membrane, as in measure- 
ments of osmotic pressure, or by differential gravitational 
forces exerted on ions of unlike mass, as in sedimentation 
equilibrium.” In addition to an electrolyte, a buffer may 
also be added to the solution to maintain the pH. 

When a solution of protein is submitted to sedi- 
mentation equilibrium, it is placed in a chamber within 
the strong centrifugal field created by a spinning rotor, 
and its distribution through the chamber is allowed to 
reach equilibrium. Because the centrifugal force in this 
chamber in the rotor is a function only of the radial dis- 
tance r from the center of rotation, the distribution of the 
protein at equilibrium is a function only of r. At equilib- 
rium, the centrifugal force upon the protein at each point 
in the solution is equal but opposite in sign to the force 
of diffusion that arises from the gradient of its concen- 
tration. If negligible gradients of electrical potential form 
because sufficient electrolyte has been added to the solu- 
tion, the protein will redistribute until the differential of 
the chemical potential it experiences at each position in 
the chamber balances the differential of the centrifugal 
potential that it experiences. For a protein, the equality 
produced by this balance can be expressed as 


du dc d 
Kr] rau a m 
r C prot / T, py 


prot prot 


(8-16) 


Molar Mass 411 


from which the exact relationship”'"'! follows by 
rearrangement 
d'Inten ` Toi! P dn vi 
dr? 2 IC prot T,Bu dCprot 
(8-17) 


where Cprot is the concentration of protein expressed in 
any units, r is the distance (centimeters) of any point in 
the chamber from the center of the rotor, œ is the angu- 
lar velocity (radians second”) of the rotor, p is the den- 
sity (grams centimeter”) of the solution of protein, and 
(op! deel r pu is the change in density of the solution as a 
function only of the change in the concentration of pro- 
tein. Because at sedimentation equilibrium the chemical 
potential through the entire solution must be the same, 
this change in the density of the solution is at constant 
chemical potential of solvent and all solutes other than 
the protein, as indicated by the subscript u. 

Because the derivative on the left in Equation 8-17 
is that of the natural logarithm of the concentration of 
protein, it will have the same numerical value regardless 
of the units chosen to express the concentration of the 
protein, and the units of concentration cancel on the 
right. Because 


dit | 1 5 
= RT| — + 2BC + 3CC Pores 
d Cy W prot prot 
(8-18) 
and 
, 1 5 1 
SC Ww + 2BC prot + ICE prot SE = Ww 
pro 
(8-19) 


Equation 8-17 can be combined with Equations 8-18 and 
8-19 and rearranged to give” 


G 


dInC prot o? do 
prot H dr? 


lim = | WwW 
IC prot T,Bu 


~ 2RT 
(8-20) 


As predicted by Equation 8-20, when a mono- 
disperse solution of a particular protein is submitted to 
centrifugation and the distribution of that protein is 
allowed to reach equilibrium, the gradient of concentra- 
tion that forms is such that a plot of In Cprot against risa 
straight line (Figure 8-2).'” From the slope of this plot 
and a value for (9p/9Cprot rp» the value of W can be cal- 
culated. 

The independent measurement of the partial deriv- 
ative (p/0C,,o:)7,p,, in Equation 8-20 requires the tabula- 
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Figure 8-2: Sedimentation equilibrium of bovine serum albu- 
min.” A solution of bovine serum albumin was placed in an opti- 
cal cell in a rotor. The rotor was placed in an ultracentrifuge and 
spun at 24,630 rpm. The distribution of protein in the optical cell 
was followed by scanning the absorbance of the solution along the 
radial axis of the cell. After 18 h, the distribution of protein was no 
longer changing and equilibrium had been reached. The 
absorbance as a function of radial distance from the center of rota- 
tion of the rotor was measured at wavelengths of 280 nm (O) and 
230 nm (A). Absorbance was converted to concentration of protein 
(Hrov micrograms centimeter’) and the logarithm to the base 10 of 
the concentration (log Wro) was taken. Radial distance, r, was 
measured in centimeters and the respective values were squared 
LC. Adapted with permission from ref 12. Copyright 1966 
American Chemical Society. 


tion of the macroscopic density of a series of solutions 
containing increasing, precisely known concentrations 
of protein and brought to equilibrium, at the appropriate 
osmotic pressure, with a solution identical to the solu- 
tion used to prepare the samples for centrifugation but 
lacking the protein. A procedure has been devised for 
measuring the required densities both at constant chem- 
ical potential of solvent and all diffusible solutes and at 
constant composition of the solution,'*”* and if this pro- 
cedure is followed it permits the most accurate and reli- 
able values for the molar mass of a protein to be 
calculated. Usually, however, the approximation 


(8-21) 


dp 5 
È == V prot Psol 
Y prot T,Pu 


is made, where ue is the concentration of protein in 
units of grams centimeter”, Dyro iS the partial specific 
volume (centimeters? gram”) of the protein in the par- 
ticular solution chosen, and Ga is the density (grams 
centimeter’) of the solution in the absence of the pro- 
tein. Unfortunately, because the decision to use this 
approximation was made for other reasons, many inves- 
tigators are unaware that their use of the right-hand term 
in Equation 8-21 is only an approximation. This lack of 


awareness can lead to some confusion.” Furthermore, 
the approximation fails significantly under certain cir- 
cumstances. For example, it is in error by 14% for DNA at 
1M sodium chloride and by 15% for protein in 4M 
guanidinium chloride.’® 

The units on the term do! cl rp as is the case 
with the term TI Ce found in Equation 8-6, are deter- 
mined by the units chosen for Cep, the concentration of 
protein. By chance, it happens that if Cprot is expressed in 
units of grams centimeter? (Hrot) and p is expressed in 
units of grams centimeter °, (Op! Prot) TP, w is dimension- 
less. Approximation 8-21, however, requires that Cprot Þe 
expressed in grams centimeter? and implicitly dictates a 
choice of these units for the entire equation. This 
assumption remains hidden in the definition of the par- 
tial specific volume 


(8-22) 


5 = E 2N 
prot am 
prot T.Pm; 


where Mprot is the mass (grams) of protein added to a 
solution, V is the resulting volume of the solution (cen- 
timeters’), and the subscript m, states that the masses of 
all of the other components j in the solution must remain 
constant. The accuracy of this measurement is no greater 
than the accuracy with which the mass of the protein in 
grams can be known. Nor is this requirement avoided by 
the use of values of Doc calculated from the amino acid 
composition.” In effect, this latter approximation simply 
relies on the care with which protein concentrations in 
grams centimeter” were determined in the earlier exper- 
iments validating such a calculation and demonstrating 
that it was reliable.'® 

Equation 8-20 requires an extrapolation to Cprot 
equal to 0. The purpose of the extrapolation is to elimi- 
nate the effect of the virial coefficients (Equation 8-19) 
on dIl/dCyıo The same protein, bovine serum albumin, 
under similar conditions (J, = 0.1-0.15 M), was used in 
the experiments described in Figures 8-1 and 8-2. From 
the values of the second virial coefficients, B, of Figure 
8-1, it can be calculated that at the actual concentrations 
examined in Figure 8-2, the nonideality of the solution 
should only have affected the molar mass determined 
from the slope of the line by less than 0.1% of its value. 
Consequently, it is not surprising that the molar mass 
calculated” from Figure 8-2, 64,500 g mol”, agrees quite 
closely with the actual value of the molar mass of bovine 
serum albumin, 66,430 g mol”, a value that was subse- 
quently established by its amino acid sequence.” 
Uncoiled polypeptides or highly charged macromole- 
cules such as DNA, however, have much larger virial 
coefficients, and sedimentation equilibrium of such 
species can be significantly affected by those virial coef- 
ficients. 

At the present time, instruments that register the 
concentration of protein by its absorbance at 280 nm are 


used to monitor its distribution over the cell within the 
rotor of the ultracentrifuge at sedimentation equilib- 
rium.” Consequently, as the example of serum albumin 
in Figure 8-2 demonstrates, because the concentration 
of protein is so low, the virial coefficients can usually be 
ignored, but it is nevertheless prudent to perform meas- 
urements at several different concentrations of protein 
or several different angular velocities of the rotor or both 
to validate their insignificance. ®”! 

If the virial coefficients can be disregarded so that 
the limit can be assumed, Equation 8-20 can be inte- 
grated, and because the concentration of protein (Cprot) is 
directly proportional to absorbance at 280 nm (Asgo) 


2 
o" AM d 
prot p 
A280 = An zen EXP | 


2 2 
(r = To ) 
2 RT IC prot |. 
(8-23) 


where Au zou is the absorbance the solution has at a refer- 
ence position within the cell at which, by definition, 
r= ro. This equation is then fit by nonlinear least squares 
to the distribution of absorbance as a function of radius 
(Figure 8-3).!52.3 From the fit, a numerical value for the 
quantity of Meld) deel /2RT is obtained. While a°, R, 
and T are known precisely, Mprot and (dp/ACp ro.) rp OF its 
surrogate Tprot (Equation 8-21) may or may not be. 

Usually, independent measurements of either 
(Op! IC aer. pu OF prot are made because the reason for the 
experiment is to determine Mac, In the case of the p51 
subunit of RNA-directed DNA polymerase from human 
immunodeficiency virus 1 (Figure 8-3), however, which 
is amonomer containing a single subunit, its molar mass 
Mprot (49,660 g mol") was already known precisely from 
its amino acid sequence, and the purpose of the meas- 
urement of sedimentation equilibrium was to estimate 
(pl etc) rpu (0.225), a quantity that was needed for the 
later experiments reported. It is also possible to use the 
sedimentation equilibrium of a protein of known molar 
mass at different concentrations of another solute to 
determine a value for the preferential hydration of the 
protein, (My,0/9Mprot rp, in the presence of that 
solute.” 

From the molar mass of a subunit of a protein, 
which has been determined precisely by sequencing its 
constituent polypeptide or polypeptides, and the molar 
mass of the intact protein, which has been estimated by 
sedimentation equilibrium, the number of subunits in 
an intact protein can be assessed. For example, by sedi- 
mentation equilibrium, the molar mass of the UvsY 
recombination protein from bacteriophage T4 has been 
estimated to be 98.6 + 3.9 kg mol!” and the molar mass 
of each of its identical constituent polypeptides is 
15.84 kg mol’. The molar mass of carbon-monoxide 
dehydrogenase from Moorella thermoacetica was esti- 
mated to be 300 + 30 kg mol’ by sedimentation equilib- 
rium;”° the molar masses of its constituent polypeptides, 
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Figure 8-3: Sedimentation equilibrium of a recombinant form of 
the p51 subunit (M = 49,660 g molt, Naa = 426) of RNA-directed 
DNA polymerase from human immunodeficiency virus 1.'° The 
optical cell in the rotor was loaded with a solution of the protein 
with an initial absorbance at 280 nm of 0.29. Centrifugation was 
performed for 74 h at 12,000 rpm to reach equilibrium. The distri- 
bution of absorbance at 280 nm over the cell is plotted in the lower 
panel as a function of radius (centimeters) from the center of rota- 
tion. The line drawn through the points is a nonlinear least-squares 
fit of Equation 8-23 to the data, using as the reference position ro 
the point of highest measured absorbance, An an, at the bottom of 
the optical cell (to the right of the graph). The upper panel is the 
deviation of the experimental absorbance (AAbs) for each point 
from the value of the fit at that point. The deviations are distributed 
at random around a value of 0. 


a and ß, are 81.73 and 72.92 kg mol’ respectively; and 
the measured mass ratio of the two polypeptides in the 
intact protein is 1.18 gg". 

There is some confusion between the use in the 
present instance of a centrifugal field to create a gradient 
of the molar concentration of a protein and its use to 
measure the sedimentation coefficient, a hydrodynamic 
property of an individual molecule of the protein. 
Because the centrifugal potential at each point in the 
chamber can be calculated directly, at equilibrium the 
chemical potential of the solute, and hence its molar 
concentration at each point, can also be calculated. In 
contrast to this measurement at equilibrium, a measure- 
ment of a hydrodynamic property of a molecule of pro- 
tein is, as the name implies, a kinetic measurement. In 
such a kinetic measurement, the rate of movement of the 
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molecule of protein under an applied force is measured. 
Free electrophoretic mobility is an example of such a 
hydrodynamic property. The confusion arises because, 
in addition to its use in sedimentation equilibrium, cen- 
trifugal force can be used to move a molecule of protein 
through a solution. This use of centrifugal force is unre- 
lated to the use of centrifugal force to create a gradient of 
concentration. The confusion also arises because the 
same instrument, an analytical ultracentrifuge, is used to 
make each of the measurements, even though they are 
unrelated to each other. 

The distribution of concentration as a function of 
radius at sedimentation equilibrium is often used to 
obtain a value for the dissociation constant between two 
oligomeric states of a protein that are in equilibrium 
with each other. For example, the protein could be an 
equilibrium mixture of a monomer and a dimer 


Ao == 2a (8-24) 
for which there is a dissociation constant 
[a]? 
= | (8-25) 
da [æ] 


To estimate the numerical value of this dissociation con- 
stant, the mass-average molar masses can be calculated 
at selected radii from the respective slopes of the distri- 
bution of concentration at those radii at sedimentation 
equilibrium (Equation 8-20), and a plot of molar mass 
against concentration of protein can be fit by an equa- 
tion incorporating the equilibrium between monomer 
and dimer.” 

Usually, however, a value for the dissociation con- 
stant is extracted directly from the distribution of the 
concentration of protein as a function of radius at sedi- 
mentation equilibrium. The purpose of creating the cen- 
trifugal field is to form predictably a continuous gradient 
in the molar concentration of the protein. If the protein 
is involved in an equilibrium between two forms that 
have different stoichiometries of subunits, the ratio 
between the molar concentrations of the two forms at a 
particular position in the sample will be a function of the 
total concentration of protein, [protein]ror, at that posi- 
tion. For example, in the case of an equilibrium between 
monomer and dimer 


[æ] _ Vy - 2[protein] ror \” 
EI Kao, 


- Le (8-26) 


Because the molar concentration of particles of protein 
at any position in the sample is [a] + [a], the equilibrium 
between monomer and dimer affects the chemical 
potential of the protein and hence the balance between 
chemical potential and centrifugal potential. The result 


of this effect is that the distribution of the concentration 
of protein as a function of radius is perturbed by the 
equilibrium between the two forms of the protein. 

The distribution of the concentration of protein in 
the chamber as a function of the radial distance from the 
center of rotation can be fit by numerical analysis with an 
equation incorporating both the centrifugal potential and 
the perturbation caused by the equilibrium between the 
two forms of the protein” to obtain simultaneously esti- 
mates of both the molar masses of the two forms and the 
numerical value of the equilibrium constant. For exam- 
ple, the distribution of the concentration of the p66 sub- 
unit of RNA-directed DNA polymerase from human 
immunodeficiency virus 1 at sedimentation equilibrium 
is consistent with the existence in the solution of an 
uncomplicated equilibrium between a monomer and a 
dimer of the subunit with a dissociation constant of 
2 x 10° M,” that of the subunit of chaperonin GroES is 
consistent with an uncomplicated equilibrium between 
monomer and heptamer of the subunit with a dissocia- 
tion constant of 1 x 10° M®*’ and that of the subunit of 
CTP synthase from E. coli is consistent with an equilib- 
rium among monomer, dimer, and tetramer of the sub- 
unit.” The dissociation constant (3M) for the 
equilibrium between monomers and dimers of DNA heli- 
case II measured by sedimentation equilibrium” agreed 
with that (1.4 uM) estimated by counting monomers and 
dimers directly in an atomic force microscope.” 

The perturbations of the distributions of concen- 
tration at sedimentation equilibrium caused by such 
equilibria are slight, so a detailed analysis of the devia- 
tions of the data from the curve that has been fit to them 
must be made to insure that those deviations are at 
random (Figure 8-3) rather than systematic. Better yet, 
the measurements should be performed at several differ- 
ent concentrations of protein or several different rotor 
speeds or both to demonstrate that the same value of the 
measured dissociation constant is consistent with each 
of the distributions of protein observed under these dif- 
ferent conditions 2727 

Sedimentation equilibrium can also be used to rule 
out the existence of an equilibrium among oligomers and 
demonstrate that there is only one form of the protein 
present in the solution” or that there are two forms of the 
protein present with different stoichiometries of sub- 
units that are not in equilibrium with each other and that 
are distributing independently of each other.” Again, in 
order to bolster the conclusion that no equilibration 
among oligomers is occurring, it should be demon- 
strated that the calculated molar mass or masses are 
affected neither by changing the concentration of the 
protein added to the cell nor by changing the speed of the 
rotor. 

Light scattering is a property of any fluid. It arises 
because a fluid is a collection of molecules undergoing 
random movements rather than a rigid solid or a uniform 
continuum of electrons. Scattered light emerges from a 


fluid at all angles to the incident direction of a beam of 
light passing through the fluid. The source of this scat- 
tered light is the electrons in the fluid that oscillate in 
response to the alternating electric field of the light and 
in turn emit light. The magnitude of the susceptibility of 
electrons in a molecule to this phenomenon arises from 
their respective polarizabilities, which are reflected in 
the refractive index of the molecule. The scattered light is 
emitted in directions other than that of the incident light. 

The emission of scattered light from a solution 
arises from regional fluctuations in polarizability on a 
scale smaller than the wavelength of the light. If a fluid 
were a uniform, unfluctuating distribution of electrons, 
the scattered light from its constituent electrons would 
always be canceled by interference and hence no emis- 
sion would result. The fluctuations producing the net 
scattering are related to local fluctuations in the concen- 
trations of the components of the solution and hence to 
the chemical potential of those components. The major- 
ity of the electrons in a solution of protein are on mole- 
cules of water. Fluctuations in the local concentrations of 
water are the major contributors to the light scattered 
from the solution in the absence of the protein. When 
protein is present, the scattering arising from the mole- 
cules of protein in the solution is in addition to this back- 
ground scattering. 

The scattering of a beam of collimated, unpolarized 
light is measured by placing a detector at an angle 0 to 
the beam of unscattered light passing through the 
sample (Figure 8-4) and at a distance r (centimeters) 
from the sample. The incremental scattering, ig, is the 
difference in intensity between light scattered by a unit 
volume of the solution of protein [photons second? 
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Figure 8-4: Angular dependence of scattered light. The angular 
dependence of scattered light is related to a coordinate system in 
which the x axis is along the beam of incident light, the z axis is par- 
allel to the electric vector of the light from a polarized source, and 
the origin is the center of the sample. The angle 6 is the angle ABC 
where A is a point on the x axis beyond the sample, AB is along 
the x axis, B is the origin, and C is the position of the detector. The 
angle dé is the angle DBC where D is any point on the z axis, B is 
the origin, and C is the position of the detector. The distance ris the 
distance from the origin to the detector. 
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(centimeter’ of solution) '] and that scattered by an iden- 
tical solution (also in photons second” centimeter’) not 
containing protein, and it is reported relative to the 
intensity of the incident light, I, (photons second '), of 
which it is a very small fraction. It can be shown!*>** that 
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(8-27) 


where ñ is the refractive index (dimensionless) of the 
solution of protein and the optical constant K (moles 
centimeter“) is defined by 
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where ñ is the refractive index of the solution in the 
absence of the protein, A, is the wavelength (centimeters) 
of the light in a vacuum, and N, is Avogadro’s number 
(6.022 x 10° mol"). It should be noted that the units on 
Cprov the concentration of protein, cancel in Equation 
8-27. This cancellation again illustrates that the concen- 
tration of protein can be expressed in any units. Equation 
8-27 also illustrates that light scattering is a measure- 
ment of the partial derivative of the chemical potential 
and hence the osmotic pressure of the solution of pro- 
tein. The limit in Equation 8-27 is taken to eliminate any 
optical interference that might arise if the dimensions of 
the protein are close to the magnitude of the wavelength 
of the light. 

The partial derivative (Of/OCp,o1),p,, is the change in 
the refractive index of the solution as only the concen- 
tration of protein is increased, at constant chemical 
potential of the other solutes such as electrolytes and 
buffers. Each of the solutions of protein used to make the 
determination of dn! dt el pu as well as the solution 
used in the determination of the light scattering itself, 
should be equilibrated by dialysis at the appropriate 
osmotic pressure against a solution identical except for 
the protein to obtain a constant chemical potential of the 
other solutes throughout. A procedure for measuring the 
required refractive indices at constant chemical poten- 
tials of diffusible components has been devised.”*’ 

At the present time, the light source usually used for 
measurements of light scattering is a laser; and, if it is not 
already polarized, the light is passed through a polarizer. 
The intensity of the scattered light from a source of 
polarized light has a different angular dependence than 
that from a source of unpolarized light. The oscillating 
electric vector of the polarized light defines the z axis of a 
coordinate system with the sample at the origin (Figure 
8-4). As defined, the x,y plane is normal to the oscillating 
electric vector. If dis the angle between the z axis and the 
ray of scattered light entering the detector, then 
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where ® is still the angle between the beam of unscat- 
tered light emerging from the sample and the ray of scat- 
tered light entering the detector. If the detector is 
confined to the x,y plane (Figure 8-4), the angle @ is 
always 90° and sin? @ = 1. In this configuration, to per- 
form the necessary extrapolation to 0= 0, the angle 0 can 
be varied over all values without changing the angle d. 
When unpolarized light is used, cos” @ changes continu- 
ously as the limit of 0 — 0 is taken, and this complicates 
the extrapolation. 

It is convenient to define a quantity, Re, known as 
Rayleigh’s ratio (centimeters '), to eliminate the dimen- 
sions of the apparatus from the calculation. For unpolar- 
ized light 


rig 
R, = (8-30) 
° I, (1 + cos*@) 


and for polarized light when d= 90° 


2: 
"lg 


27, 


Ry (8-31) 


When Equation 8-27 or 8-29 is combined with Equation 
8-30 or 8-31, respectively, as well as Equations 8-18 and 
8-19 


x 2 
lim Ry = Kant A| w Ga 
a OC prot TP 


prot 


The double limit in Equation 8-32 is often taken by a pro- 
cedure known as the Zimm plot,® but with proteins of 
normal dimensions, the variation of Rg with @ is small 
and inconsequential, and the extrapolation to @=Oisa 
minor correction and often ignored. The extrapolation to 
Cprot = 0 is always required because the virial coefficients 
(Equation 8-7) can be appreciable (Figure 8-5).*’ To per- 
form this extrapolation, a rearrangement of Equations 
8-29 and 8-32, incorporating the virial coefficients 
explicitly, is performed: 


-2 
1 on 1 
T E | | W + 2BC prot + Beet 


(8-33) 


In Figure 8-5, two experiments with bovine serum 
albumin under different conditions, with and without 
added electrolyte, are presented. As with the measure- 
ments shown in Figure 8-1, it can be seen that the virial 
coefficient, B, changes appreciably with changes in con- 
ditions; in this case it even inverts in sign because the 
protein is participating in a concentration-dependent 
oligomerization in the absence of electrolyte. Bovine 
serum albumin readily self-associates to form adventi- 
tious dimers, trimers, and higher oligomers.“ The inter- 
cepts, however, again remain the same. It has been 
shown that under the same conditions with the same 
protein, the same values of the second virial coefficient 
are obtained by measurements of either osmotic pres- 
sure or light scattering.“ This result demonstrates that 
the virial coefficient is a property of the solution itself 
rather than the method of measurement, and it provides 
further evidence that these techniques are both measur- 
ing the same property ofthe solution. From the extrapo- 
lations in Figure 8-5, the molar mass of bovine serum 
albumin was estimated to be 70,200 g mol, which is 
within 6% of the actual value of 66,430 g mol’. 

From an examination of Equation 8-32, it is clear 
again that the units chosen. for Cprov the concentration of 
protein (units centimeter” E determine the units (units 
mole”) of the parameter W. If the concentration of pro- 
tein is known in grams centimeter ™, the units on W will 
be grams mole”, or molar mass. The determination of 
the molar mass, however, will only be as accurate as the 
measurement of the concentration of protein. 

Electrospray mass spectrometry is widely used to 
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Figure 8-5: Light scattering by solutions of bovine serum albu- 
min.” The Rayleigh ratio Roo. (centimeters!) was determined for 
each of a series of solutions of bovine serum albumin, each at a dif- 
ferent concentration (Yo; milligrams centimeter’), by measuring 
the scattering of unpolarized light (u) at an angle 0 of 90°. The quo- 
tient eer Ron, AT was calculated for each measurement and plotted 
as a function of the concentration of protein (Y%roı). Measurements 
were made in 0.15M sodium chloride (upper line) or in water 
(lower line) of solutions prepared by diluting an isoionic solution of 
albumin into the appropriate solutions. Adapted with permission 
from ref 39. Copyright 1954 American Chemical Society. 


determine the molecular masses of proteins. Individual 
vaporized molecules of protein, each molecule with its 
own particular charge, are submitted to mass spectrom- 
etry (Figure 8-6). From the envelope of individual 
peaks, precise estimates of the mass of the molecule of 
protein can be calculated. For example, it was possible to 
show that the molecular mass of the blue copper protein 
rusticyanin from Thiobacillus ferrooxidans was 
16,552 Da, which is within 1 Da of the mass calculated 
from its amino acid sequence.“ 

Major applications of mass spectrometry are the 
analysis of posttranslational modification, the verifica- 
tion of the integrity of a preparation of protein, and the 
assessment of the number of subunits in an oligomer. For 
example, in the case of rusticyanin the conclusion drawn 
from the mass spectrometric analysis was not that the 
molecular mass of the protein was 16,552 Da, which was 
already known, but that the protein lacked posttransla- 
tional modifications. In the case of the L1 metallo-f lac- 
tamase of Stenotrophomonas maltophilia, mass 
spectrometric analysis demonstrated that the normal 
posttranslational removal of the 21 amino-terminal 
amino acids had occurred,” and in the case of subunit V 
of ubiquinol-cytochrome-c reductase, mass spectromet- 
ric analysis demonstrated that the iron-sulfur cluster was 
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Figure 8-6: Mass spectrum of aldehyde dehydrogenase (NAD) 
from rat liver.” The purified protein was vaporized by electrospray, 
and the resulting gaseous ions were passed into a mass spectrom- 
eter. The relative abundance of each ion is plotted as a function of 
the ratio (m/z) of its molecular mass to its charge. Four ions of the 
single subunit (54.87 kDa), bearing charges of +15 (m/z=3658 Da), 
+16 (m/z=3430 Da), +17 (m/z=3228 Da), and +18 (m/z=3049 Da), 
are labeled M for monomeric. The six ions labeled T, bearing 
assigned charges of +29 (m/z = 7568 Da), +30 (m/z = 7316 Da), 
+31 (m/z=7080 Da), +32 (m/z=6859 Da), +33 (m/z=6651 Da), and 
+34 (m/z = 6455 Da) were designated as molecular ions of a 
tetramer of four subunits (219.48 kDa) because the monomer 
could have produced only ions of +7 (m/z = 7839 Da), +8 (m/z = 
6859 Da), and +9 (m/z = 6096 Da) in this range of masses. Because 
three peaks lie between each of these positions, the species pro- 
ducing all of these peaks must be ions of a tetramer. 
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present Electrospray mass spectrometry was used to 
verify that each member of a set of site-directed mutants 
of dethiobiotin synthase from E coli contained the 
intended amino acid replacement.*’ Mass spectrometry 
can also discover errors in a sequence. For example, in 
the case of subunit I of bovine ubiquinol-cytochrome-c 
reductase, electrospray mass spectrometry indicated cor- 
rectly that the published amino acid sequence of the pro- 
tein was missing 27% of its amino acids.“ Intact molecules 
of a protein containing several subunits (Figure 8-6) can 
be vaporized to obtain the molecular mass of the 
oligomer,“ and membrane-bound proteins can be vapor- 
ized after the phospholipid in which they are normally 
dissolved has been removed from them.” 

Now that the sequences of so many proteins are 
known, as well as the stoichiometries of their subunits, 
precise values of molar mass can be calculated for a large 
array of proteins from their atomic compositions. These 
can be compared with values determined by osmotic 
pressure, sedimentation equilibrium, and light scatter- 
ing before the actual molar masses were known (Table 
8-1). By and large, the agreement between the actual 
values of molar mass and the measured values is quite 
close, and this in itself validates the methods. 

The problem with molar mass, no matter how accu- 
rately it can be determined, is that it means very little to 
most people. Once it was clear that proteins were poly- 
mers of amino acids, sometimes posttranslationally 
modified, the reason behind all determinations of molar 
mass has been to estimate the number of amino acids 
that are contained in a given polypeptide and the 
number of polypeptides that are contained in a given 
protein. These are quantities that anyone can under- 
stand. Unfortunately, the results have seldom been pre- 
sented in these terms even though they always could 
have been. 

As it happens, the mean molar mass of an amino 
acid in most proteins is a reasonably constant number, 
110+3 g mol” (Table 8-2). When the mean molar mass 
of an amino acid is calculated from the amino acid com- 
position of a set of proteins containing a total of 43,250 
amino acids (Table 7-4),” the value obtained is 109.3 g 
mol”. It follows that the number of amino acids in most 
proteins can be estimated by simply dividing an estimate 
of its molar mass by 110 g mol”, after glycosylation and 
other posttranslational modifications are accounted for. 

It should not be imagined, however, that the molar 
mass is a more fundamental number while the number 
of amino acids in a protein is somehow an approxima- 
tion. It has already been pointed out that expressing Cprot 
in the units of grams centimeter” during determinations 
of molar mass is an arbitrary choice. Were Cprot expressed 
in terms of moles of amino acids centimeter "`, each of the 
procedures would have necessarily provided as accurate 
a value of W in terms of moles of amino acids (mole of 
protein)’ rather than grams (mole of protein)". The abil- 
ity to determine molar masses by physical measure- 
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Table 8-1: Comparison of the Actual Molar Masses of Selected Proteins and the Molar Masses Determined by Light 
Scattering, Osmotic Pressure, and Sedimentation Equilibrium 


molar mass (kg mol") 


protein subunit light osmotic sedimentation 
stoichiometry actual“ ` scattering pressure equilibrium 

bovine serum albumin’? a 66.47 70 69 66 
bovine pancreatic ribonuclease!) a 13.69 13.7, 13.0 
bovine ß-lactoglobulin"? o 36.56 36 35, 39 38 
lysozyme from Gallus gallus™”? a 14.31 14.8 17.5, 16.6 
porcine ı-lactate dehydrogenase”? Ou 145.9 143 146 141 
phosphorylase b from Oryctolagus cuniculus**> Kä 194.3 185 
mammalian glyceraldehyde-3-phosphate dehydrogenase’”* Ou 143 145 
bovine catalase” °° Ou 232.5 240 
fructose-bisphosphate aldolase from muscle of O. cuniculus®®®! Ou 156.8 142, 158 
mammalian apoferritin®”® OnBm (m+n=24) 510 430 
aspartate carbamoyltransferase from E coli”*5 Kär 307.7 310 
bovine chymotrypsinogen?'’” a 25.7 36 
porcine pepsin A?! a 34.62 36 39 
2-dehydro-3-deoxy-phosphogluconate aldolase from 

Pseudomonas putida® 03 71.81 73 
aspartate kinase I-homoserine dehydrogenase I from E coli” Ou 356.5 358 360 
bovine glutamate dehydrogenase‘” "' Oe 333.4 313 320 


“The actual molar mass of each protein was calculated from the amino acid sequences of its constituent polypeptides and their stoichiometry in the complex. 


Table 8-2: Tabulation of the Mean Grams (Mole of Amino Acid)" in a Set of Proteins 


total number grams (mole 


of amino acids polypeptide of amino 
polypeptide“ type of protein length? in protein stoichiometry acid)! € 
parvalbumin from Gadus callarias cytoplasmic 113 113 a 107.1 
lysozyme from G. gallus extracytoplasmic, enzymatic 129 129 a 111.0 
R17 coat protein virus coat 129 106.4 
human hemoglobin (œ + p) cytoplasmic 141 + 146 574 (œp) 108.0 
bovine chymotrypsinogen extracytoplasmic, enzymatic 245 245 a 104.8 
bacteriorhodopsin from membrane-spanning 249 747 03 108.1 
Halobacterium halobium 
L-lactate dehydrogenase from cytoplasmic, enzymatic 332 1328 Lola 110.1 
Squalus acanthius 
human immunoglobulin G Eu (œ + p) extracytoplasmic 446 + 214 1320 Loft, 108.9 
human fibrinogen (a+ B+ 7 extracytoplasmic, fibrous 831+ 4154+ 411 3314 Lob. 111.8 
human serum albumin extracytoplasmic 585 585 a 113.6 
glycogen phosphorylase b from cytoplasmic, enzymatic 841 1682 O 115.4 
O. cuniculus 
murine anion exchanger membrane-spanning 929 1858 On 110.8 
ovine Na*/K*-exchanging ATPase membrane-spanning 1016 1319 aß 110.4 
(a subunit) 
embryonic skeletal myosin (œ subunit) cytoplasmic, fibrous 1940 4560 aßyaßo 115.4 


from Rattus norvegicus 
mean molar mass of an amino acid = 110 +3 


“Proteins composed of one or more polypeptides, the sequences of which were available in the Swissprot data bank, were chosen as examples. An attempt was made to 
include examples of all types of proteins, but extremely unusual proteins such as collagen were avoided. The constituent polypeptides the compositions of which were 
used are indicated. The lengths of the polypeptides chosen for analysis are presented in numbers of amino acids. “Calculated by dividing the molar mass of the protein 
portion of the polypeptide or polypeptides by the length or combined lengths, respectively. 


ments has always relied ultimately upon the ability of the 
investigator to make an accurate measurement of dry 
weight. All values of molar mass can be traced back to 
such a determination. The difficulties involved in meas- 
urements of dry weight have been noted,’® and more 
than anything else, the accuracy of the values in Table 
8-1 are a testimony to the careful measurements of this 
quantity. Accurate dry weight measurement, however, 
requires more protein than is usually available, and other 
measures of protein concentration have unfortunately 
but necessarily supplanted it. Ironically, when the 
amount of protein is in short supply, the most accurate 
method for assessing its concentration is quantitative 
amino acid analysis, which is a measure of moles of 
amino acids centimeter’, and the use of Ag) to follow 
protein in sedimentation equilibrium actually is a meas- 
ure of the concentration of tyrosine and tryptophan in 
the solution. 

One of the most peculiar manifestations of the abid- 
ing infatuation with molecular mass is the habit of 
naming proteins on the basis of estimates of the molecu- 
lar mass performed by electrophoresis of their complexes 
with dodecyl sulfate. For example, protein p27 from 
simian retrovirus SRV-1 has an actual molecular mass of 
24.73 kDa,” and protein p56 from murine lymphoma has 
an actual molecular mass of 57.82 kDa.” It is unclear what 
will be done when two-digit numbers run out. 

Part of the description of a particular protein is an 
enumeration of the length of each of the polypeptides 
from which it is composed and the number of each sub- 
unit that it contains. At one time this information could 
be most conveniently learned by ascertaining both the 
molar mass of the entire protein and the molar mass of 
the isolated individual polypeptides. The history of this 
quest is interesting but beyond the scope of the present 
discussion. In two celebrated instances, that of aspartate 
carbamoyltransferase and that of fructose-bisphosphate 
aldolase,” disagreements arose over the results from 
such measurements. These particular disagreements 
coincided with the development of the two techniques 
that have supplanted almost entirely the earlier methods 
of determining molar mass that were just described. 
These newer procedures are both based on the elec- 
trophoresis of complexes between polypeptides and 
dodecyl sulfate upon gels of polyacrylamide. In one pro- 
cedure, sieving is used to display the different types of 
polypeptides in the protein and provide estimates of the 
length of each. In the other procedure, patterns of cova- 
lently cross-linked polypeptides separated by elec- 
trophoresis are used to count the number of each 
polypeptide present in the whole protein. 
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Problem 8-1: Calculate the molar mass of bovine 
ribonuclease from its sequence. 


Problem 8-2: Calculate the molar masses of these pro- 
teins that have the following incremental osmotic pres- 
sures at 25 °C. 


protein lim A 
Yprot > 0 Y prot 


173 dyne cm” (g of protein)” L 
0.722 cm of H,O (g of protein) "* 
3.57 N cm (g of protein)’ 


L-lactate dehydrogenase”? 
B-lactoglobulin” 
bovine serum albumin® 


* These are the centimeters that the level of the solution containing 
the protein rose above the level of the solution lacking the protein 
as a result of the expansion of the former at the expense of the 
latter. This additional layer of solution exerts a pressure because of 
the force of gravity. The units of centimeters are converted into 
newtons centimeters” by multiplying by the density of the solution 
lacking the protein (grams centimeter’) and the gravitational 
acceleration (980.6 cm s~ at sea level, 45° latitude) felt by the 
excess fluid on top of the solution of protein. The density of the 
solution has already been converted into the density of water. 


Problem 8-3: The isoelectric pH of bovine serum albu- 
min in a solution of 0.15 M NaCl is 5.37.”® From the data 
in Figure 8-1, calculate the mean net charge number Zon 
on the serum albumin at pH 7.0 by assuming that the dif- 
ference in slope between the two lines is due entirely to 
the Donnan effect. 


Problem 8-4: The second virial coefficient, B, for the 
osmotic pressure of lysozyme from Gallus gallus in solu- 
tions of (NH,),SO, varies as a function of pH and ionic 
strength, L/ 


pH L (M) second virial 
coefficient 

(cm? umol g”) 
4 1 -140 + 23 
7 1 -198+15 
8 1 -307 +21 
4 3 -396 + 19 
7 3 -423 + 34 
8 3 —446 + 26 


(A) What causes the second virial coefficient to 
become less negative as the pH of the solution is 
lowered? 

(B) Why is the effect of pH smaller at the higher ionic 
strength? 
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(C) What limit do these measurements place on the 
isoelectric point of lysozyme? 

(D) What limit do these measurements place on the 
second virial coefficient at the isoelectric pH for 
lysozyme? 

(E) Why does the second virial coefficient have 
the sign that it has at the isoelectric pH of 
lysozyme? 


Problem 8-5: Human hemoglobin is a protein formed 
from two o polypeptides, two p polypeptides, and four 
hemes for a total molar mass of 64,450 g mol”. It is 
referred to as an (aß), tetramer to indicate that it is a het- 
erotetramer formed from two af heterodimers. The 
osmotic pressure at 20 °C of a solution of hemoglobin,” 
at a concentration of 3g L!, that had been flushed 
exhaustively with N, gas was 1750 dyne cm”. When the 
solution was flushed with O, gas, the osmotic pressure 
increased to 2500 dyne cm” even though the concentra- 
tion of the hemoglobin was unchanged. 


(A) Assume ideal behavior and that the virial coeffi- 
cients are 0 and calculate the average molar mass 
for each circumstance. 


(B) Explain the values that you obtain. 


Problem 8-6: The enzyme glutamate-ammonia ligase, 
which catalyzes the ATP-dependent condensation of 
glutamate and ammonia, was isolated from E. coli. It was 
purified by (NH,),SO, fractionation followed by ion- 
exchange chromatography. The final preparation of the 
enzyme was considered to be a pure protein because a 
single peak was obtained on repeated ion-exchange 
chromatography. The protein is a complex of 12 identical 
polypeptides each 468 amino acids in length. The length 
of its constituent polypeptides has been ascertained by 
sequencing its genomic DNA. 

Glutamate-ammonia ligase was dissolved in 6 M 
guanidinium chloride, and the osmotic pressure of this 
solution was determined with a high-speed osmometer 
at 20 °C. The following results were obtained 


protein concentration pressure 
(g LH (mmHg) 

2 0.69 

4 1.40 

8 2.87 

10 3.63 

15 4.73 

20 7.96 


(A) What is the molar mass of the protein in 6 M 
guanidinium chloride? 


(B) What has this salting-in solute done to the pro- 
tein? 


Problem 8-7: Calculate the molar mass of aspartate 
kinase I-homoserine dehydrogenase I from the data in 
this figure (adapted with permission from ref 80; copy- 
right 1968 Springer-Verlag). 


(um) 


log (fringe displacement) 
N 
T 


L Les 
35 36 
r2? (cm?) 


Sedimentation equilibrium of aspartate kinase I-homoserine 
dehydrogenase I from E. coli. The initial protein concentration was 
0.6 mg mL". Interference optics were used. In an interference scan, 
the logarithm of the distance in micrometers of the fringe dis- 
placement between blank corrected fringes at each point in the 
chamber and blank corrected fringes at zero level concentration is 
plotted against 7’, the square of distance (centimeters?) from the 
center of the rotor to the point at which the measurement was 
made. It has been assumed that the concentration of the protein 
was zero at the top of the sample so that the fringe displacement (in 
micrometers) is directly proportional to the concentration of pro- 
tein. Because the concentration of protein at the top of the sample 
is so small, the leftmost points are scattered. 


Use the approximation of Equation 8-21, and 
assume Dat = Dun, The partial specific volume of the 
protein is 0.737 cm? g’, the rotor was spinning at 11,272 
revolutions min” (1 revolution is 27 radians), and the 
temperature was 23 °C. 


Problem 8-8: The light scattering from a series of solu- 
tions of ovalbumin from G. gallus was measured at an 
angle 00f90° ®! The apparatus sampled the light scattered 
from a volume of solution of 1.8 cm? with a photomulti- 
plier tube at 10 cm from the center of the sample. The dif- 
ferences between the scattering of the buffer alone and 
the scattering of each solution were used to calculate the 
Rayleigh ratio, R, for each solution. The intensity of 
the incremental scattered light, is, was less than 0.001% of 
the intensity of the incident light Jy. The Rayleigh ratios for 
light of wavelength 436 nm in the vacuum were as follows: 


hrot (gem) Rap (cm) 
4.3 x 10° 1.13 x 10% 
5.8x 10° 1.52 x 107 
8.6 x 10° 2.22 x 107 
9.7x 10° 2.54 x 107 
13.7 x 10° 3.66 x 107+ 
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The refractive index ñ of the solution without the pro- 
tein was 1.333, the increment of the refractive index with 
concentration Ted! deep for the protein at A, = 
436 nm was 0.1883 cm” e, and the temperature was 
25 °C. 


(A) Determine graphically the limit 


Y prot 


lim 
Yprot > H Roo 


(B) Assume that the Rayleigh ratio, Rg, does not vary 
significantly with variation in 6 for this small pro- 
tein at this long wavelength and that 


Yprot 


Yprot > 0 90 


and use the value for this limit to estimate the 
molar mass of ovalbumin. 


Problem 8-9: Calculate the mean molar mass of an 
amino acid from the amino acid composition of the set of 
proteins in Table 7-4. 


Electrophoresis on Gels of Polyacrylamide Cast 
in Solutions of Dodecyl Sulfate 


The sodium salt of dodecyl sulfate (H,,C];OSO; ) is a deter- 
gent widely used commercially to dissolve nonpolar sub- 
stances in water. It accomplishes this purpose by forming 
micelles. A micelle of dodecyl sulfate at moderate ionic 
strength (0.2 M) contains about 100 of the anions in an 
oblate ellipsoid that is 3 nm across at its minor axis.®?® All 
of the anionic sulfonates are at the surface of the ellipsoid 
and the hydrocarbon is in the center. It is the hydrocar- 
bon core of the micelle that dissolves individual molecules 
of anonpolar substance, producing the detergent proper- 
ties. Although dodecyl sulfate must be present at concen- 
trations high enough to form micelles in order to interact 
with proteins, the complexes that result between the 
anions of dodecyl sulfate and polypeptides do not seem to 
involve discrete micelles. 

When sodium dodecyl sulfate™ is added to a solu- 
tion of protein at a concentration greater than its critical 
micelle concentration* and at ratio greater than 2 g of 
dodecyl sulfate (g of protein)", all of the polypeptides pres- 
ent in the solution are unfolded and separate from each 
other as they become coated with the dodecyl sulfate. The 
amount ofdodecylsulfate coating the unfolded, separated 
polypeptides at saturation is usually a function only of 


* The critical micelle concentration is the minimum concentration 
at which the detergent forms micelles. 


their total length. Pitt-Rivers and Impiombato® observed 
that, within a series of globular, water-soluble proteins, 
each of the constituent polypeptides would bind 0.54 + 
0.01 molecule of dodecyl sulfate for every amino acid in 
its sequence. The important point, however, is not the 
numerical value of this ratio but the fact that it is constant 
(less than 2% variation) regardless of the protein exam- 
ined, as long as it is of the usual water-soluble, globular 
variety and does not have significant segments of its 
sequence enriched in acidic amino acids and lacking basic 
amino acids. This regularity in the binding of dodecyl 
sulfate, however, is observed only when all of the cystines 
in the proteins, if there were any, have been cleaved.” 
Usually this is done by disulfide interchange with a small 
thiol (Figure 3-20). The constant ratio between bound 
dodecyl sulfate and the number of amino acids presum- 
ably results from the fact that proteins displaying this 
behavior all have similar compositions of amino acids. 
Proteins with peculiar compositions,” an excess of 
charged side chains,®® or an excess of hydrophobic side 
chains® behave anomalously. 

The complexes that form between dodecyl sulfate 
and polypeptides are extended structures and have been 
variously described as cylindrical rods the length of 
which is directly proportional to the length of the 
polypeptide® or micellar pearls of dodecyl sulfate on a 
string of the flexible polypeptide.” No definitive descrip- 
tion of their structure is available, but there is no evi- 
dence that the dodecyl sulfate in these complexes is 
present in discrete packets of 100 molecules of detergent, 
as would be expected if the micelles present in the 
absence of the protein were simply incorporated intact 
into a long string upon the unfolded polypeptide. 

As with nucleic acids and presumably for the same 
reasons, the complexes between dodecyl sulfate and 
those polypeptides that bind a constant ratio of this 
strongly anionic detergent all display the same free elec- 
trophoretic mobility, (-2.62 + 0.04) x 10° cm? siv, 
regardless of the length of the polypeptide.” In the case 
of nucleic acids, the invariance of the free electrophoretic 
mobility with length results from the uniform distribu- 
tion of negative charge along the regular polymer, and 
presumably this is also a necessary condition met by the 
complexes between dodecyl sulfate and polypeptides. 
With nucleic acids, however, this is a covalently con- 
ferred, intrinsic property of the phosphodiesters of the 
backbone rather than the fortuitous and less reliable 
inclination of the polymer to bind a charge-conferring 
species uniformly along its length. As such, any polypep- 
tide that binds dodecyl sulfate abnormally should have a 
different free electrophoretic mobility. When the amount 
of dodecyl sulfate bound to a series of polypeptides was 
purposely decreased, the magnitudes of their free elec- 
trophoretic mobilities also decreased.” 

As is the case with nucleic acids,” native proteins 
(Figure 1-17), and other macromolecules submitted to 
electrophoresis on gels of polyacrylamide or other poly- 
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meric supports, the electrophoretic mobilities of com- 
plexes between dodecyl sulfate and polypeptides (Figure 
8-7)” follow the relationship 

u; = u°; exp(-K,;T,) 


i (8-34) 
where T, is the concentration of acrylamide (in percent) 
from which the gel was cast and the retardation coeffi- 
cient, K,; is a constant unique to the particular polypep- 
tide i. Because u°; is the free electrophoretic mobility of 
the complex between dodecyl sulfate and polypeptide i 
and w° is the same for all complexes between dodecyl sul- 
fate and well-behaved polypeptides, this relationship 
predicts that the lines in Figure 8-7 should intersect at 
the axis of the ordinate when T, is equal to 0, which is 
almost the case. Because each complex has a unique 
retardation coefficient, electrophoresis on gels of poly- 
acrylamide can be used to separate these complexes one 
from the other (Figure 8-8) .°° 

Systems for stacking complexes between dodecyl 
sulfate and polypeptides have been developed to 
improve the resolution of the separation. One system 
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Figure 8-7: Relative electrophoretic mobilities, R of complexes of 
polypeptides and dodecyl sulfate as a function of the concentration 
of acrylamide, T,, used to cast the gel.” Various proteins—myoglo- 
bin (my; 153 aa), chymotrypsinogen (ch; 245 aa), L-lactate dehy- 
drogenase (ld; 291 aal, ovalbumin (ov; 385 aa), glutamate 
dehydrogenase (gd; 501 aa), bovine serum albumin (sa; 583 aa), 
and phosphorylase (ph; 842 aa)—were dissolved separately in a 
solution containing a concentration of dodecyl sulfate sufficient to 
saturate the polypeptides. Each was then submitted to elec- 
trophoresis on gels of polyacrylamide cast in a solution of 
1% sodium dodecyl sulfate. A series of gels was used for each pro- 
tein that differed in the percent acrylamide (T,) from which they 
were cast. The gels were stained for protein, and the distance 
migrated by the protein was divided by the distance migrated by a 
dye, Pyronine-Y, of low molecular weight to obtain the relative 
electrophoretic mobility (R) of each polypeptide at each percent 
acrylamide. The assumption made was that the mobility of the 
Pyronine-Y would be unaffected by the percent acrylamide. 
Reprinted with permission from ref 92. Copyright 1972 Journal of 
Biological Chemistry. 


releases the complexes from the descending boundary in 
which they are stacked by using an ascending boundary 
that increases the concentration of the neutral conjugate 
base of the cationic acid that is common to all of the solu- 
ons "TP The other releases them by using an ascending 
boundary that delivers the neutral conjugate base of a 
different cationic acid of much higher pK, to jump the pH 
after the complexes have stacked.” The latter system is 
more effective at releasing the smaller polypeptides from 
the descending boundary than is the former. The former 
relies heavily on the increase in the concentration of 
polyacrylamide at the top of the running gel to accom- 
plish the release and fails to do so when the concentra- 
tion of polyacrylamide in the running gel is decreased 
below a certain level. 

As with nucleic acids, the electrophoretic mobilities 
of complexes between dodecyl sulfate and polypeptides 
on gels of polyacrylamide are a regular function of the 
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Figure 8-8: Separation of polypeptides by electrophoresis on gels 
of polyacrylamide cast in a solution of 0.2% sodium dodecyl sul- 
fate.” Proteins containing the polypeptides were dissolved in solu- 
tions of sodium dodecyl sulfate sufficient to saturate them. They 
were submitted to electrophoresis on cylindrical gels (0.6 cm x 
10 cm) cast from 10% acrylamide in 0.1% sodium dodecyl sulfate 
and 0.1 M sodium phosphate, pH 7.0. Following electrophoresis, 
the gels were stained for protein. The polypeptides that were run 
on gel A were those composing bovine catalase (na = 506), the 
mitochondrial isoform of porcine fumarate hydratase (naa = 466), 
isoform A of fructose-bisphosphate aldolase from muscle of 
Oryctolagus cuniculus (Na = 361), glyceraldehyde-3-phosphate 
dehydrogenase from muscle of O. cuniculus (Naa =332), human car- 
bonate dehydratase I (naa = 260), and equine cardiac myoglobin 
(naa = 153). The polypeptides run on gel B were the same as those 
run on gel A, but the myoglobin was omitted. The polypeptides run 
on gel C were catalase, fumarate hydratase, the E isoform of alco- 
hol dehydrogenase from equine liver (naa = 374), glyceraldehyde- 
3-phosphate dehydrogenase, carbonate dehydratase, and myoglo- 
bin. Reprinted with permission from ref 93. Copyright 1969 Journal 
of Biological Chemistry. 


length of the polypeptides, as long as they have a normal 
composition of amino acids” and bind the proper 
amount of dodecyl sulfate. To understand this property 
of the electrophoretic separations, the process known as 
sieving must be understood. 


Suggested Reading 
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Sieving 


Sieving of macromolecules, for example, native proteins, 
nucleic acids, or complexes between dodecyl sulfate and 
polypeptides, occurs during both chromatography by 
molecular exclusion and electrophoresis on polymeric 
supports. Sieving is the discrimination between macro- 
molecules on the basis of size that is accomplished by a 
random network of linear polymers. In chromatography 
by molecular exclusion, the network of polymer forms 
the beads among which the mobile phase percolates and 
is the sieve within the beads through which the macro- 
molecule diffuses when it is inside of the stationary 
phase. In electrophoresis, the network of polymer forms 
an obstacle course through which the macromolecule 
must pass as it moves in the direction of the electric field. 

Consider a geometric solid of any shape within a 
network of lines thrown at random through a volume of 
space completely containing the solid. An equation” for 
the probability that none of these lines intersects the 
solid, P(ni), was derived during the solution of an unre- 
lated topological problem,” and 


P(ni) = exp (-1S/4) (8-35) 


where lis the density of the lines (centimeters centime- 
ter) and S is the surface area of the solid (centimeters’). 
Assume that a macromolecule is a geometric solid and a 
network of chemical polymers is a network of lines. 
When a molecule of protein is submitted to chromatog- 
raphy by molecular exclusion, the fraction of the total 
volume available to macromolecule i, K,,; (Equation 
1-21), in the stationary phase of randomly arranged 
linear polymers should be that fraction of the total 
volume the occupation of which by the macromolecule 
does not cause any polymer to intersect the macromole- 
cule. In this case, 


Kay = €XP(—BTp Sapp i) (8-36) 


a 


where Tp is the concentration of polymer in percent 
[grams (100 cubic centimeters) ”'], b is a constant of pro- 
portionality to convert, among its other roles, the con- 
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centration of polymer [grams (100 cubic centimeters) ] 
into its linear density (centimeters centimeter”), and 
Sapp,i is the apparent surface area (centimeters) of the 
macromolecule i. 

Because the polymers are not lines but solids them- 
selves, the apparent surface area of the macromolecule 
is not its real surface area. The apparent surface of 
macromolecule i, Sapp,» lies outside its actual surface by a 
distance equal to the sum of the widths of any tight shells 
of hydration around either the macromolecule or the 
polymer and the width of the polymer itself. All of the 
dimensions that cause the polymer not to be a line and 
the macromolecule not to be a dry smooth solid object 
are incorporated into the dimensions of an apparent 
macromolecule that is larger than the actual macromol- 
ecule. When the actual macromolecule collides with the 
actual polymer, the apparent macromolecule collides 
with a line in the center of the polymer. 

This model predicts that if a series of beaded sta- 
tionary phases of increasing concentration of polymer is 
used to separate the same set of standard macromole- 
cules by molecular exclusion chromatography, then 


Kay,i = exp (-K,, Tp ) (8-37) 


If this is so, then In Ka; should be a linear function of Tp. 
It has been demonstrated that the relationship of 
Equation 8-37 describes the behavior of both pro- 
eine DOT! and polysaccharides’ during chromato- 
graphy by molecular exclusion on gels of both 
polyacrylamide (Figure 8-9)°%"°%1% and linear dextrans.” 

If Equation 8-36 describes behavior during chro- 
matography by molecular exclusion, then when Tp is 
fixed, -ln Ka, should be directly proportional to Sap) ;. 
Computer programs exist for calculating the accessible 
surface area of a molecule of a protein (6-12) by rolling a 
spherical probe over the surface of its crystallographic 
molecular model.” The accessible surface area is the 
surface area traced by the center of the probe. 
Unfortunately, there is no computer program that per- 
forms such a calculation for a cylindrical probe, which 
would not detect the smaller irregularities of the surface 
so readily as does a spherical probe (Figure 6-20). One 
can choose, however, a radius for the spherical probe 
that is large enough to include the radius of the polymer 
and the layers of hydration on the polymer and the pro- 
tein as well as being large enough that the smaller irreg- 
ularities of the surface of the protein that would not be 
detected by a cylinder are also not detected by the 
sphere. When a sphere of the appropriate radius is used 
as the probe, the values of -ln Ka, for a series of standard 
proteins that have been submitted to chromatography 
by molecular exclusion” upon cross-linked dextran are 
found to be directly proportional to the accessible sur- 
face areas!” calculated from crystallographic molecular 
models of those same proteins (Figure 8-10A). 
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1.00 Unfortunately, the accessible surface area of a mol- 
ecule of protein is not one of its more interesting proper- 
= 0.60 ties; usually sieving is used to estimate the number of 
x amino acids a protein contains. It has been noted by 
ke Ogston!” that if a set of macromolecules were all spheres 
of radius R; and the polymers of the network were infi- 

SEH nitely long right cylinders of radius rp, then 
0.10 Sappi = Ale + Kol (8-38) 


Because the partial molar volume of a molecule of pro- 
tein is a function only of its composition of amino 
acids’®'”° and because the amino acid compositions of 
most proteins are similar, each of their partial molar vol- 
umes should be directly proportional to the number of 
amino acids each protein contains (Table 8-2). To the 
extent that a molecule of protein iis a sphere and has the 
normal composition of amino acids, the number of 
amino acids it contains, naa, should determine its radius 
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Figure 8-9: Distribution coefficients for a series of globular pro- by the relationship 

teins submitted to chromatography by molecular exclusion on 

polyacrylamide gels of varying composition.” A series of gels 3 [na V % 

cast from different concentrations of acrylamide, T, (percent), were R*, ee eee (8-39) 
separately fragmented to form suspensions of polyacrylamide An Na 


granules of different porosities. Columns were made from these 
chromatographic media, and a set of standard globular proteins 


were submitted to chromatography by molecular exclusion on 
these columns and their respective elution volumes were used to 
calculate the respective distribution coefficients, Kp (Equation 


where V „is the mean partial molar volume of the amino 
acids in the usual protein (82 cm? mol”) and the super- 
script has been added to R to indicate that this is a sphere 


1-22). The distribution coefficient Kp is directly proportional to the 
distribution coefficient K,,. The values of Kp are plotted on a loga- 
rithmic scale as a function of T,. The proteins were, in order of their 
number of amino acids, (1) cytochrome c (cy; 104 aa), (2) ribonu- 
clease (mn; 124 aa), (3) lysozyme (ly; 129 aa), (4) myoglobin 
(my; 153 aa), (5) ovomucoid (om; 186 aa), (6) chymotrypsinogen | -In Kai ) 


equivalent in volume to the volume of the protein, which 
is never exactly a sphere. 
When Equations 8-36, 8-38, and 8-39 are combined 


(ch; 245 aa), (7) pepsin A (ps; 327 aa), (8) ovalbumin (ov; 385 aa), 
(9) hemoglobin (hb; 574 aa), (10) serum albumin (sa; 583 aa), and 
(11) immunoglobulin G (ig; 1320 aa). Adapted with permission 
from ref 98. Copyright 1970 National Academy of Sciences. 
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Figure 8-10: Sieving of globular proteins by molecular exclusion chromatography." The proteins, dissolved in 0.1 M KCI at pH 7.5, were 
submitted to chromatography by molecular exclusion on a column (2.5 cm x 50 cm) of Sephadex G-200. The volumes at which the several 
proteins eluted from the column were tabulated. The distribution coefficient K,, for each was calculated (Equations 1-20 and 1-21) from its 
elution volume, V,, and the void volume of the column, Vo, and the included volume of the column, V; (as determined by the volume at which 
sucrose eluted). It was assumed that V; = Vp,o, that Vr = (1 - foi Vino, and that W, = 20 mL gl. The proteins used by Andrews!™ were, in 
order of increasing total number of amino acids, equine cytochrome c (Naa = 104), myoglobin from Physeter catodon (Naa = 153), bovine chy- 
motrypsinogen (Ma = 245), ovalbumin from G. gallus (naa = 385), bovine serum albumin (n,. = 583), bovine lactoperoxidase (naa = 612), the 
cytoplasmic isoform of malate dehydrogenase from porcine heart (Naa = 666), bovine transferrin (naa = 685), glyceraldehyde-3-phosphate 
dehydrogenase from muscle of O. cuniculus (Naa = 1328), the A isoform of L-lactate dehydrogenase from muscle of O. cuniculus (Naa = 1324), 
alcohol dehydrogenase from Saccharomyces cerevisiae (Naa = 1388), the A isoform of fructose-bisphosphate aldolase from muscle of O. cunicu- 
lus (Nag = 1452), the mitochondrial isoform of porcine fumarate hydratase (naa = 1864), bovine catalase (naa = 2024), B-galactosidase from E coli 
(Naa = 4092), equine apoferritin (naa = 4368), and urease from Canavalia ensiformis (Naa = 5040). (A) The quantity -ln K,, is plotted as a func- 
tion of the accessible surface area (6-12; nanometers’) of those molecules of protein for which crystallographic molecular models were avail- 
able (all of the proteins used by Andrews except lactoperoxidase, alcohol dehydrogenase, and urease). The accessible surface areas were 
calculated with a spherical probe of radius 1.1 nm (Figure 6-20) by use of the program of Lee and Richards!” as adapted by Dr Ilya Shindyalov 
of the Protein Data Bank. Of necessity, crystallographic molecular models of proteins from species other than the species providing the pro- 
tein for the molecular exclusion often had to be used. The access codes of the crystallographic molecular models in the Protein Data Bank 
that were chosen are 5CYT, 3CYT, 1CYC, ICRC, 1HRC, 1ABS, 1DXD, 1HJT, 1SWM, 2MBW, 1MCY, 2CGA, 4CHA, 10VA, 1BJ5, 1UOR, 1BKE, 
1A06, 4MDH, 5MDH, 1DOT, 1OVT, 1CB6, 1CE2, 1GPD, 4GPD, 3GPD, 1LDM, 2LDX, 3LDH, 5LDH, 9LDB, 9LDT, 6ALD, 4ALD, 2ALD, LEUR, 
1YFM, 1DGF, 1DGG, 4BLC, 7CAT, 8CAT, 1BGL, 1BGM, 1FHA, 1AEW, and 1DAT. The closed circles are those for proteins the frictional ratios 
of which are less than 1.20. (B) The quantity (-In Kl" is plotted as a function of the cube root of the number of amino acids in each protein. 
(C) The quantity (-In Kl" is plotted as a function of the Stokes radius, a, of each protein calculated from its diffusion coefficient by Equation 
1-67. In panel A the line was fit to the averages of the surface areas for each protein, but in panels B and C it was fit only to the eight points 
(closed circles) for the proteins the frictional ratios of which are less than 1.2. 


and a plot of (-In Kel" against (n,.,)” should be a linear 
relationship. When the data of Andrews'” are displayed 
in this fashion,” they are linearly related (Figure 8-10B). 
The intercept of the line with the abscissa is at a negative 
value as predicted by Equation 8-40, and this intercept 
yields an estimate of rp, the mean radius of the polymers 
of dextran, of 0.46 nm, which is not unreasonable when 
it is considered that this parameter also includes the 
irregularities of the surface of the protein and the hydra- 
tion of the molecule of protein and the molecules of poly- 
mer. 

Several of the points in Figure 8-10B deviate from 
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the line that was drawn. One way to quantify the devia- 
tion of a molecule of protein from spherical behavior is 
to define a frictional ratio, LEI, which is simply the 
quotient between the measured frictional coefficient of 
the protein and the frictional coefficient it would have if 
its mass were distributed to form a hard sphere of the 
same partial specific volume. The measured frictional 
coefficient, f, is calculated from the diffusion coefficient 
(Equation 1-64) and the ideal frictional coefficient 
(Equation 1-66) is defined by 


(8-41) 


fo = SanR° 
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where R° is defined by Equation 8-39. For the most 
spherical of proteins, values of the frictional ratio of 
1.1-1.2 are observed.” The values of the frictional ratio 
are always greater than 1 because the water bound to a 
protein and the irregularities of its surface increase its 
actual frictional coefficient. 

The solid circles in Figure 8-10B are those for all of 
the proteins chosen by Andrews that happen to have fric- 
tional ratios less than 1.2. They are, in ascending order of 
size (with the frictional ratios in parentheses), 
cytochrome c (1.09), myoglobin (1.16), chymotrypsino- 
gen (1.12), ovalbumin (1.18), glyceraldehyde-3-phos- 
phate dehydrogenase (phosphorylating) (1.16), L-lactate 
dehydrogenase (1.17), apoferritin (1.15), and urease 
(1.18).* The line in Figure 8-10B was drawn through 
these points because they should be the most ideal 
examples. It can be seen that several of the points for pro- 
teins with larger frictional ratios indicate that they are 
behaving as if they were larger than they are, which 
makes sense if their behavior is a function only of their 
surface area. Further validating the conclusion that it is 
only the surface area of a molecule of protein that deter- 
mines its distribution coefficient is the fact that the dis- 
tribution coefficients of proteins with larger frictional 
ratios (open circles in Figure 8-10A) show no more devi- 
ation from linear behavior than do those of the proteins 
with frictional ratios less than 1.20 (solid circles in Figure 
8-10A) when they are plotted as a function of surface 
area rather than volume. 

It has been argued’®’’™° that, rather than the 
apparent surface area of a molecule of protein, the fun- 
damental variable in describing its behavior when it is 
submitted to sieving on chromatography by molecular 
exclusion is its effective radius, or Stokes radius, a, cal- 
culated from its diffusion coefficient (Equation 1-67). 
When the same values of (-In Kl" displayed in Figure 
8-10B are replotted against the effective radii for the var- 
ious proteins (Figure 8-10C), no significant improve- 
ment is seen. Something can be learned, however, when 
a line is again drawn through the points for the eight 
most globular proteins listed above, the properties of 
which should be least affected by the change to effective 
radius from na^. It can be seen that the use of the effec- 
tive radius significantly overcompensates for the devia- 
tions from the linear behavior displayed in Figure 8-10B 
for almost all of the proteins that have larger frictional 
ratios. It is hard to explain why proteins with irregular 
shapes would behave as if they were smaller than more 
spherical ones. 

It is the correlation between K,, and M, that is 
exploited when data from chromatography by molecular 
exclusion are used to estimate the number of amino 


* Values of frictional ratios were calculated from diffusion coeffi- 
cients cited by Andrews™ or in the tables of the CRC Handbook of 
Biochemistry’ and the actual number of amino acids in each pro- 
tein. 


acids contained within a protein of interest.’™ This esti- 
mation requires that the distribution coefficients, K,,, for 
a series of uncomplicated standard proteins of known 
number of amino acids be used to define the line for the 
chromatographic system chosen for the particular exper- 
iment. The estimate for the number of amino acids in the 
protein of interest is interpolated from the known values 
for the standards. A standard line for a particular chro- 
matographic column must be established by running 
standards on that column, because the properties of 
each commercial batch of chromatographic medium are 
unique.’ It is also important to run the chromato- 
graphic system with a buffer of ionic strength 0.1-0.2 M 
to eliminate the effect of the dimensions of the ionic 
double layer around the charged macromolecules on the 
parameter K,,.'!° 

As a macromolecule moves through a polymeric 
network during electrophoresis, it is also being sieved. In 
this case, it must travel through the network in a kinetic 
process, rather than equilibrating with the internal 
volume of a bead, but it appears that this distinction is 
inconsequential. It has been argued” that one can view 
the solid matrix of the polymerized gel as an array of 
screens through which the macromolecule must travel. A 
random cross section through a random three-dimen- 
sional network of lines will provide a distribution of 
points. The probability that none of these points lies 
within the randomly placed, random cross section of a 
geometric solid of any shape is still described by 
Equation 8-35.” If those points represent one of the 
screens in the gel, if the macromolecule can pass only 
through openings in that screen large enough so that no 
point forming that screen is found within the cross sec- 
tion of the macromolecule, and if the rate of its move- 
ment through the screen is proportional to the 
probability that openings of the proper size or larger will 
be encountered, then the mobility of a macromolecule 
through a gel during electrophoresis should be described 
by 
U; = u‘, exp (-b Te Sapp,i) (8-42) 


1 


It has already been noted that the electrophoretic 
mobilities of proteins (Figure 1-17), nucleic acids,”' and 
complexes between dodecyl sulfate and polypeptides 
(Figure 8-7) satisfy Equation 8-34, and therefore their 
behavior is also consistent with Equation 8-42. It should 
also be the case that 


im DS. 


oe (8-43) 


The values for the retardation coefficients K, for a series 
of standard proteins in their native conformations sub- 
mitted to electrophoresis on a series of polyacrylamide 
gels cast from increasing concentrations of acrylamide 
(Figure 1-17) are directly proportional to their accessible 


surface areas calculated from crystallographic molecular 
models of the same proteins (Figure 8-11A).“°"" 

As already noted, however, the property of a mole- 
cule of protein that is usually of interest is not its acces- 
sible surface area but the number of amino acids it 
contains. If it is assumed that a series of proteins resem- 
bles a series of spheres and that Equations 8-38 and 8-39 
are still valid approximations, then 


K A 
d = 2 R°. 
| =| pri 


(8-44) 


J” Lé 


and a plot of (K,)” against (n,,)” for this series should 
yield a linear relationship.” When the data of Hedrick 
and Smith!!! and Bryan” for the retardation coefficients, 
K,,, of a series of native proteins that had been sieved by 
electrophoresis through gels cast from increasing con- 
centrations of acrylamide, T,, are plotted against the 
cube root of the number of amino acids they contain, 
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they do display linear behavior (Figure 8-11B).'®11 
Again, the intercept with the abscissa is at a negative 
value as predicted by Equation 8-44, and this intercept 
yields a value of rp, the mean radius of the polyacry- 
lamide, of 0.9 nm.” Unlike the behavior of globular pro- 
teins on chromatography by molecular exclusion (Figure 
8-10B), the proteins with higher frictional ratios (open 
symbols in Figure 8-11B) do not deviate systematically 
from linear behavior more significantly than those with 
frictional coefficients less than 1.20 (closed symbols in 
Figure 8-11B). 

It has been suggested by Hedrick and Smith!" that 
this linear correlation permits the number of amino 
acids in a native protein of unknown size to be estimated 
from its behavior on electrophoresis. This is particularly 
useful in situations in which only the electrophoretic 
mobility of the protein of interest can be measured. 

Two types of macromolecules that are of interest in 
biochemistry are globular macromolecules, such as 
many of the proteins in their native state, and extended 
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Figure 8-11: Relationship between the retardation coefficients K, measured by electrophoresis and the surface areas or the numbers of 
amino acids n,, for a set of proteins.““'" A series of proteins were submitted to electrophoresis, each protein on a series of gels cast from solu- 
tions of increasing concentrations of acrylamide. The slopes of the lines from plots of the logarithm of the relative mobility against the per- 
cent of acrylamide were used to calculate K, for each protein. The proteins used, in order of increasing number of amino acids, were 
ovalbumin from G. gallus (Naa = 385), porcine a-amylase (n,a = 496), bovine serum albumin (Maa = 583), human transferrin (naa = 679), ovo- 
transferrin from G. gallus (Nna = 686), aspartate kinase-homoserine dehydrogenase from Zea mays (Naa = 828), hexokinase from S. cerevisiae 
(Naa = 970), the A isoform of L-lactate dehydrogenase from muscle of O. cuniculus (Na = 1324), the A isoform of fructose-bisphosphate aldolase 
from muscle of O. cuniculus (nz, = 1452), B-amylase from Ipomoea batatis (n,a = 1992), bovine catalase (naa = 2024), the M1 isoform of pyru- 
vate kinase from O. cuniculus (Naa = 2120), bovine xanthine oxidase (n,a = 2662), equine apoferritin (n,a = 4272), ribulose-bisphosphate car- 
boxylase from Chlamydomonas reinhardtii (naa = 4904), and urease from C. ensiformis (Naa = 5040). (A) The values of the retardation 
coefficients K, are plotted as a function of the accessible surface areas (nanometers?) of the proteins calculated as described in Figure 8-10A 
with a spherical probe of 1.1 nm. Surface areas were calculated for ovalbumin, serum albumin, transferrin, ovotransferrin, L-lactate dehy- 
drogenase, aldolase, pyruvate kinase (1PKM, 1A49), catalase, xanthine oxidase (1FO4), apoferritin, and ribulose-bisphosphate carboxylase 
(1AA1, 1RCO). (B) The square roots of the retardation coefficients for all of the proteins are plotted as a function of the cube roots of their 
number of amino acids. In both panels, circles are for the data of Hedrick and Smith!!! and squares are for the data of Bryan;"” in panel B, 
solid symbols are for proteins with frictional ratios less than 1.20. 
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polymers, such as unfolded single-stranded DNA and 
unfolded polypeptides. Extended polymers, the shapes 
of which are unable to be approximated as spheres, nev- 
ertheless display regular behavior when they are submit- 
ted to sieving. The apparent surface areas, Sapp, of 
extended, flexible polymers, such as unfolded polypep- 
tides or unfolded single-stranded DNA, should increase 
linearly with their lengths, naa because as each monomer 
is added, it increases Sapp by the same increment, once 
the polymer is beyond a certain length. In this case, 

S 


= € + d(Ma,i) (8-45) 


app,i 


where c incorporates all of the properties of homologous 
short polymers only a few segments in length. 

When proteins are dissolved in solutions of guani- 
dinium chloride, they unfold and their individual 
polypeptides become separated, random coils.''” A series 
of these randomly coiled polypeptides, the lengths of 
which are now precisely known, were submitted to chro- 
matography by molecular exclusion on beaded agarose, 
and the values of Kp (Equation 1-22) were reported.’ 
Combining Equations 1-21, 1-22, 8-36, and 8-45 


-ln Kp; In Kap + bTp (c + dna, 7 (8-46) 


where Kwp is the distribution coefficient of the small ref- 
erence solute R used to determine the apparent internal 
volume. This equation predicts that a plot of In Kp; 
against naa; Should be linear, and it is (Figure 8-12). It 
has been proposed that the length of a polypeptide, the 
sequence of which is unavailable, could be estimated 
from its distribution coefficient by use ofsuch a standard 
curve. 

The regular behavior of single-stranded nucleic 
acids upon electrophoresis is crucial to the strategies for 
determining their sequences. The relative elec- 
trophoretic mobilities ofthe components in the ladder of 
single-stranded RNA displayed in Figure 3-13 can be 
measured from the photograph. Each relative mobility 
can in turn be related to the relative mobility of one of the 
components chosen as a standard, for example, the 
mobility of the one containing 30 nucleotides.’ If 
Equations 8-34, 8-43, and 8-45 are combined, and if it is 
remembered that u°; for all unfolded single-stranded 
nucleic acids is the same, then 


U: 
-m{ : )- bT,d(n,,; - 30) (8-47) 


Uso 


This predicts that a plot of In (u;/ uzo) against n,,; should 
be linear, and it is (Figure 8-13) with the exception of the 
compression at the discontinuity in the figure. 

It could be ascertained, because sequencing was 
being performed,’ that the bands for the components 
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Figure 8-12: Sieving of unfolded, randomly coiled polypeptides 
on chromatography by molecular exclusion.''? Each of a series of 
proteins was dissolved in 6 M guanidinium chloride and 0.1 M 
2-mercaptoethanol and submitted to chromatography on a 
column (1.5cm x 90cm) of beaded 6% agarose. The elution 
volume of each polypeptide was used to calculate its distribution 
coefficient Kp. The negative natural logarithm of Kp (-In Kp) is plot- 
ted as a function of the number of amino acids in the respective 
sequence, Daa, The polypeptides chosen were those composing 
equine cytochrome c (Maa = 104), bovine hemoglobin (n,, = 145), 
bovine ß-lactoglobulin (naa = 162), immunoglobulin G light chain 
from O. cuniculus Di = 220), the mitochondrial isoform of malate 
dehydrogenase from liver of Rattus norvegicus (Naa = 314), the 
A isoform of fructose-bisphosphate aldolase from O. cuniculus (naa 
= 363), ovalbumin from G. gallus (n,a = 385), immunoglobulin G 
heavy chain from O. cuniculus (naa = 450), a-amylase A from 
Aspergillus oryzae (Na, = 478), bovine serum albumin (na = 583), 
and human transferrin (naa = 679). 


in the ladder representing single-stranded ribonucleic 
acids of lengths 24-27 had all overlapped, producing this 
compression. It is usually assumed that a compression 
results from the ability of the 3’ end of the single- 
stranded nucleic acid to double back upon itself and 
form a double-stranded hairpin as soon as the length of 
the nucleic acid becomes greater than a critical value in 
the expanding series. As the series approaches the dis- 
continuity, it behaves regularly, because no hairpin is 
imminent. At the discontinuity and beyond it, the hair- 
pin is present in each component, but it is eventually 
found far enough in the interior for the series to resume 
its linear behavior with the same slope it had previously 
but with a displacement. The displacement indicates 
that the polymer is behaving as if it were smaller than it 
actually is, presumably because the surface area of the 
double-helical hairpin in its interior is smaller than the 
surface area of the same number of nucleotides in a 
single-stranded state. 
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Figure 8-13: Sieving of single-stranded ribonucleic acid on elec- 
trophoresis in a gel of polyacrylamide. The distance between the 
origin and the final position of each band on the gel in Figure 3-13 
was measured. These distances were each divided by the distance 
for the band corresponding to the ribonucleic acid 30 bases in 
length to obtain mobilities relative to this internal standard (u/ts9). 
The negative natural logarithms of these mobilities are plotted as a 
function of the lengths in bases, n,, of each single-stranded ribonu- 
cleotide. 


Unfolded polypeptides and unfolded single- 
stranded nucleic acids are examples of well-defined 
extended polymers. Complexes between dodecyl sulfate 
and polypeptides, because they are not chemically 
defined covalent polymers, are not so well understood. 
Nevertheless, both the behavior of polypeptides dis- 
solved in solutions of guanidinium chloride (Figure 8-12) 
and the behavior of single-stranded nucleic acids (Figure 
8-13) when they are respectively submitted to sieving 
suggest that the extended, unfolded complexes that form 
between dodecyl sulfate and polypeptides, which resem- 
ble the former in their unfolded state and the latter in 
both their distribution of negative charge and unfolded 
state, should display electrophoretic mobilities corre- 
lated with the length of the polypeptides. In fact, it was 
noted by Shapiro, Viñuela, and Maizel’’® that this is the 
case. The electrophoretic mobility of the complex 
between dodecyl sulfate and polypeptide i is generally 
reported as a relative mobility: 

Se 
EE (8-48) 
STD 


where Usrp is the mobility of a standard, either a small 
dye that can be readily followed visually or one of the 
obvious boundaries on a discontinuous gel. The advan- 
tage of the former point of reference is that because the 
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dye is dissolved in micelles of dodecyl sulfate, it marks 
the mobility of a micelle and hence the free elec- 
trophoretic mobility of the complexes between the pro- 
teins and the dodecyl sulfate (Figure 8-7). The advantage 
of the latter point of reference is that its absolute elec- 
trophoretic mobility can be calculated. 

Equations 8-34 and 8-48 can be combined, and 


Hamm 
In Rfi = | SC? | _ Kom T, + Le T: (8-49) 
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This equation can be combined with Equations 8-43 and 
8—45, and 


D 


SE 
In Rei = In =) - Kant, + bT, (c + dna, i) 
(8-50) 


Because u°; should be the same for all complexes 
between well-behaved polypeptides and dodecyl sul- 
fate” and u°stp, Kam, and T, are all constant, this equa- 
tion predicts that a plot of -ln Kr against na should be 
linear. When the natural logarithms of the relative mobil- 
ities measured by Weber and Osborn” are plotted as a 
function of the now known lengths of these polypep- 
tides, na they conform to this expectation (Figure 
SÉ e 

At the present time, the method almost universally 
used to estimate the length of a polypeptide, the 
sequence of which is not yet known, is to determine the 
mobility of its complex with dodecyl sulfate upon elec- 
trophoresis on polyacrylamide gels. The mobility of the 
unknown is compared to the mobilities of complexes 
between dodecyl sulfate and standard polypeptides of 
known length, usually by plotting the data as in Figure 
8-14. It should be realized, however, that the widespread 
reliance on this method is based on the assumption that 
the polypeptide of interest binds the same amount of 
dodecyl sulfate (amino acid)” as the standards used. A 
comparison of Figure 8-13, which describes the behavior 
of a series of polymers in which the uniformity of the 
charge distribution is covalently dictated, with Figure 
8-14, which describes the behavior of a series of poly- 
mers in which the uniformity of charge distribution 
depends only on a fortuitous consistency in its composi- 
tion producing a fortuitous consistency in its ability to 
bind a small electrolyte, emphasizes the drawbacks of 
this assumption. 


* When complexes between dodecyl sulfate and polypeptides are 
submitted to electrophoresis on polyacrylamide gels in which the 
complexes are stacked by moving discontinuities,” the relative 
mobilities of the standards do not fall on a line when they are plot- 
ted as in Figure 8-14. Nevertheless, their mobilities increase 
monotonically with their length, and the length of an unknown 
polypeptide can be estimated by interpolation. 
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Figure 8-14: Sieving of complexes between dodecyl sulfate and 
unfolded polypeptides on electrophoresis in gels of polyacrylamide 
cast in solutions of sodium dodecyl sulfate.” A series of proteins 
(listed in Table I of ref 93) were submitted to electrophoresis as 
described in Figure 8-8. A marker dye was included in each sample, 
the mobility of which served as an internal standard. Its position 
was marked on each gel at the end of each run and the gels were 
stained. The distance migrated by each polypeptide was divided by 
the distance migrated by the marker dye to obtain a relative mobil- 
ity R; The negative natural logarithms (-In R} of these values of rel- 
ative mobility have been plotted as a function of the number of 
amino acids (naa) in the respective sequences of the polypeptides 
(www.expasy.org/sprot/). 


Because they bind dodecyl sulfate with different 
stoichiometries, proteins that have peculiar composi- 
tions of amino acids,” that are excessively hydropho- 
bic,” or that are significantly glycosylated do not have 
mobilities on polyacrylamide gels when they are coated 
with dodecyl sulfate that reflect only their lengths, and 
this procedure is unreliable in these instances. 

A further problem that also should be realized is 
that beyond the ranges of relative mobilities displayed in 
Figure 8-14, the linear behavior of complexes between 
dodecyl sulfate and polypeptides often fails. This was 
originally pointed out by Weber and Osborn,” and it 
manifests itself in the tendency of complexes between 
dodecyl sulfate and very long polypeptides to travel 
faster than their lengths should permit. Because it is 
impossible to predict in what range of polymer lengths 
nonlinear behavior will become significant, a large col- 
lection of standard polypeptides the mobilities of which 
are close to and on either side of the mobility of the 
unknown should be chosen and the length of the 
unknown should be estimated by interpolation. 

The failure of complexes between dodecyl sulfate 
and long polypeptides to behave as if they were geo- 


metric solids is probably due to a change in the mecha- 
nism of sieving. It has been proposed that when the long 
dimension of a severely elongated macromolecule is sig- 
nificantly greater than the mean spacing between the 
fibers of polymer in a sieve, it will be hindered from 
reorienting significantly about any axes normal to that 
long dimension by the network itself. If so, it will be 
forced to move through the network as a worm in a ran- 
domly meandering burrow.''® At very low electric field 
gradients, the mobility of such a wormlike molecule 
should be proportional to na" rather than exp(-n,,). At 
field gradients in the range normally used for elec- 
trophoresis, the burrow has a strong tendency to 
become aligned with the field, causing the mobility of 
the worm to become almost independent of naa.’ It is 
possible that the deviation of the retardation coeffi- 
cients of complexes between dodecyl sulfate and longer 
polypeptides from linear behavior on gels of one poly- 
acrylamide concentration®®” results from a change in 
the mechanism of sieving that occurs as the longer 
dimension of the polymer becomes so long that it is 
forced to travel through the network as a worm rather 
than as a randomly reorienting, flexible geometric solid. 
The point at which this change in mechanism would set 
in would be a function not only of the length of the poly- 
mer but also of the spacing of the fibers in the network. 
If this is the case, it would explain the common observa- 
tion that the behaviors of complexes between dodecyl 
sulfate and long polypeptides often become more linear 
when the concentration of polymer in the network is 
decreased. 
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Problem 8-10: A series of standards was run to calibrate 
a cylindrical column of Sephadex G-75 for chromatogra- 
phy by molecular exclusion™ so that it could be used to 
estimate the number of amino acids in bovine a-lactal- 
bumin. The void volume of the column, Vo, was 71 mL, 
and the total volume of the bed, Vr, was 226 mL. The fol- 
lowing elution volumes were observed: 


protein Naa elution volume 
cytochrome c 104 138 mL 
ribonuclease 124 136 mL 
myoglobin 153 127 mL 
chymotrypsinogen 245 113 mL 
a-lactalbumin 131 mL 


(A) Calculate K,,= (Ve - Vo)/ (Vr - Vo) for every protein. 


(B) Estimate the number of amino acids in a-lactal- 
bumin. 


Problem 8-11: Glycogen phosphorylase from 
Oryctolagus cuniculus was dissolved in a solution of 
sodium dodecyl sulfate sufficient to saturate the protein 
and submitted to electrophoresis on a polyacrylamide 
gel cast in a solution of sodium dodecyl sulfate. The rela- 
tive mobilities of the glycogen phosphorylase and several 
standard proteins were measured. 


protein length of polypeptide mobility relative 
(amino acids) to marker dye 
myosin 1938 0.10 
B-galactosidase 1023 0.16 
serum albumin 583 0.33 
catalase 506 0.37 
glutamate dehydrogenase 501 0.43 
fumarate hydratase 466 0.47 
fructose-bisphosphate aldolase 363 0.56 
glycogen phosphorylase 0.23 


Estimate the length of the polypeptide composing glyco- 
gen phosphorylase. 


Problem 8-12: Estimate the length of the polypeptide 
that composes porcine pepsin A from the following rela- 
tive mobilities of complexes between the polypeptides 
and sodium dodecyl sulfate on electrophoresis on poly- 
acrylamide gels cast in a solution of dodecyl sulfate.” 


polypeptide Naa distance 
migrated 
(cm) 
serum albumin 583 1.78 
immunoglobulin G heavy chain 450 2.92 
D-amino acid oxidase 347 4.80 
glyceraldehyde-3-phosphate dehydrogenase 332 4.80 
aspartate carbamoyltransferase, catalytic 310 5.22 
polypeptide 
carboxypeptidase A 309 5.48 
carbonate dehydratase I 260 6.12 
pepsin A 5.06 


Cataloguing Polypeptides 


Rather than the intact oligomeric complexes of subunits 
formed from properly folded polypeptides that are sepa- 
rated during the electrophoresis of native proteins 
(Figure 1-19), the components separated when a mixture 
of different proteins is submitted to electrophoresis in 
the presence of dodecyl sulfate represent individual, 
unfolded, unassociated polypeptides. Consequently, 
such an electrophoretic separation is a catalogue of the 
polypeptides present in a sample” rather than a cata- 
logue of the proteins. A graphic example of such a cata- 
logue can be seen in Figure 1-22. 

When a purified protein the homogeneity of which 
has been verified by electrophoresis in its native state is 
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submitted to electrophoresis in the presence of dodecyl 
sulfate, the pattern observed is a dissection of the protein 
into the different polypeptides of which it is composed. 
Usually a protein is composed of only one polypeptide, 
and that one polypeptide is usually present in the native 
protein in several copies. There are, however, many pro- 
teins that contain two or more different polypeptides, 
and a comprehensive description of the quaternary 
structure of such a protein requires that each of its con- 
stituent polypeptides be recognized as a unique compo- 
nent of the overall complex. 

A catalogue of the polypeptides from which a pro- 
tein is formed is reliable only if the shortcomings of 
electrophoresis in the presence of dodecyl sulfate have 
been recognized and eliminated. First, it has already 
been noted that, on discontinuous electrophoresis, 
components of high mobility often fail to escape the 
descending boundary. Complexes between dodecyl sul- 
fate and short polypeptides are often trapped in this 
way and are unresolved. Second, it is also the case that, 
for reasons not well understood, all complexes between 
dodecyl sulfate and polypeptides less than 100 amino 
acids in length seem to have the same electrophoretic 
mobility," regardless of the concentration of polya- 
crylamide. This lower limit below which resolution fails 
can be lowered to about 25 amino acids in length by 
adding 8 M urea to the polyacrylamide gel.?”"'? Third, 
because the cleavage of one peptide bond out of the 
hundreds present in an intact polypeptide always pro- 
duces two new polypeptides that will be separated from 
each other and from their parent by electrophoresis in 
the presence of dodecyl sulfate, any degradation of the 
native protein by endopeptidases during or before its 
purification can artifactually multiply the apparent 
number of polypeptides without significantly altering 
the native protein or its own electrophoretic mobility. 
Fourth, endopeptidases often unfold more slowly than 
other proteins upon exposure to dodecyl sulfate, and 
they cleave their unfolded neighbors before they in turn 
succumb. If the purified protein is contaminated with 
even minute amounts of the endopeptidases that are 
always present in a homogenate, they can degrade the 
polypeptides during the preparation of the sample. 
Because these cleavages of the unfolded polypeptides 
are produced at random, as opposed to the unique 
cleavages usually produced during the degradation of a 
native protein by endopeptidases, they cause polypep- 
tides in the sample to disappear into hundreds of frag- 
ments smeared over the field, each present in very low 
yield. Such an apparent disappearance of a polypeptide 
or polypeptides can also occur during the purification 
of the protein rather than during preparation of a 
sample for electrophoresis. For example, it was once 
thought that the stoichiometry of the subunits of nico- 
tinic acetylcholine receptor from the electric eel was 
simpler than that from the electric ray until it was 
demonstrated that the missing polypeptides appeared 
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when indigenous endopeptidases, unavoidably present 
during the purification, were intentionally inacti- 
vated.'” Yet the acetylcholine receptor originally puri- 
fied, even though it had been cut up at random, was 
still biologically active. Finally, if only a fraction of the 
disulfides have been reduced, cross-linked, unreduced 
and un-cross-linked, reduced forms of the same 
polypeptide will appear as separate components. All of 
these and other artifacts must be recognized and elimi- 
nated! before the pattern observed upon elec- 
trophoresis in the presence of dodecyl sulfate gives a 
reliable assessment of the different polypeptides pres- 
ent in a protein. 

Each of the various components separated by elec- 
trophoresis in the presence of dodecyl sulfate may or 
may not represent a single polypeptide with a unique 
sequence of amino acids. The majority of the time they 
do, but it sometimes happens that one of the compo- 
nents represents two different polypeptides the lengths 
of which are so close to each other that they cannot be 
resolved. For example, the two subunits with unrelated 
amino acid sequences and unrelated functions compos- 
ing the multienzyme complex from Salmonella 
typhimurium responsible for anthranilate synthase, glu- 
tamine amidotransferase, and anthranilate phosphori- 
bosyltransferase happen to be 530 and 520 aa in length 
and are not resolved by electrophoresis in the presence 
of dodecyl sulfate. They are, however, cleanly resolved 
by electrophoresis in 8 M urea, a solution in which they 
are also unfolded but in which their differences in 
charge are not swamped by the binding of dodecyl sul- 
fate.'”” It is also possible that one or more of the compo- 
nents on the gel represent fragments of a larger 
component, also seen on the same gel. One way to 
resolve both of these ambiguities is to perform peptide 
maps. 

A peptide map is a characteristic and reproducible 
display of the peptides produced when a polypeptide is 
digested with a specific endopeptidase. The display is 
usually produced by chromatographically or elec- 
trophoretically separating the digest in two dimensions 
to produce a characteristic pattern or map. Usually, the 
two dimensions are the respective orthogonal directions 
on a sheet of chromatographic paper or a thin layer of 
cellulose on a backing of plastic. Originally, elec- 
trophoresis was performed in the first dimension and 
chromatography in the second. Peptide maps are sensi- 
tive methods for assessing the similarity of two polypep- 
tides, demonstrating that one polypeptide is a fragment 
of another, or revealing that one of the components on a 
polyacrylamide gel represents two polypeptides that for- 
tuitously have the same electrophoretic mobility. 

The most reliable maps are obtained from tryptic 
digests of a polypeptide because trypsin is the most 
specific and dependable of the endopeptidases. 
Polypeptides usually contain about 5 mol % arginine and 
7 mol % lysine,'“ and for every 100 aa in length, about 11 


tryptic peptides should be present in the digest.* Each 
component on the map should represent a different 
tryptic peptide. Ideally, the resolution of the map should 
be high enough that every peptide in the digest appears 
as a separate, distinguishable component. 

The initial triumph of analytical peptide mapping 
was in the examination of a mutant hemoglobin.” It had 
been proposed that the difference between normal 
hemoglobin, referred to as hemoglobin A, and hemoglo- 
bin S, a hemoglobin producing pathological distortions 
in erythrocytes, was due to a small difference in the 
amino acid sequence of the two proteins.” It was then 
shown that one and only one of the tryptic peptides on 
the respective peptide maps of the two proteins dis- 
played an altered mobility (Figure 8-15).'”° It was con- 
cluded that all of the peptides the mobilities of which 
were the same between the two maps had identical 
sequences and the same relative locations in the two 
intact polypeptides, but that the one peptide the mobil- 
ity of which was different had a sequence that differed 
between the two proteins by at least one amino acid. 
Because it is unlikely that the only two or three changes 
in the sequence of a polypeptide would occur in the same 
tryptic peptide, this result alone was substantial evidence 
that the two proteins differed from each other at only one 
location in their respective sequences. This was soon 
shown to be true by complete amino acid sequencing. 

A similar strategy was used to evaluate the differ- 
ences in the amino acid sequences of the different iso- 
forms of actin.’*”!*8 Each member of the set of the 
isoforms of actin chosen for the experiment, each of 
which had been isolated from a different species or a dif- 
ferent tissue—a total of eight in all—was digested with 
trypsin, and each peptide map was compared to the pep- 
tide map of actin from skeletal muscle of O. cuniculus, 
the complete amino acid sequence of which was known. 
In all cases, the majority of the tryptic peptides were dis- 
tributed over the map in the same pattern as the corre- 
sponding peptides on the map from the standard, and 
this permitted the various maps to be aligned with that of 
the standard. The peptides occupying the same positions 
in a pair of maps were assumed to be identical to each 
other in amino acid sequence. Amino acid analysis was 
used to verify these identities. Each unique peptide on 
the maps of the various unknowns was eluted and 
sequenced. Each of these amino acid sequences could be 
aligned with one of the tryptic peptides in the sequence 
of actin from muscle of O. cuniculus, and in this way the 
amino acid replacements in the sequences of the other 
actins could be readily established. This set of experi- 
ments relied on the fact that, aside from the first six 
amino acids in each sequence, all of the actins that were 
being compared show about 95% identity when their 


* About 5% of the lysines and arginines in a protein are followed by 
proline, and trypsin is unable to cleave either a lysylproline or an 
arginylproline peptide bond. 


Figure 8-15: Comparison of tryptic peptide maps 
of hemoglobinA (left panel) and hemoglobin S 
(right panel). The respective hemoglobins were 
denatured at 90 °C for 4 min and the denatured pro- 
teins were digested with trypsin (1:50 trypsin/hemo- 
globin). The resulting digests were spotted on sheets 
of chromatographic paper, and the peptides were 
separated in the horizontal dimension by elec- 
trophoresis at pH 6.4 (negative pole to the left) and 
in the vertical direction by ascending chromatogra- 
phy in 1-butanol/acetic acid/water (3:1:1). The pep- 
tides were visualized with ninhydrin. A peptide in 
the middle of the right side of the left panel is 
replaced by a peptide in the center of the right panel. 
Reprinted with permission from ref 126. Copyright 
1958 Elsevier Science Publishers. 


amino acid sequences are aligned. This level of identity 
was what produced the underlying pattern that permit- 
ted the maps to be aligned and, in turn, permitted the 
ready identification of the peculiar peptides. 

When the polypeptide is a long one, there may be so 
many components on a tryptic peptide map that they 
begin to overlap. One way to solve this problem is to 
modify the tyrosine side chains in the protein with 
radioactive iodine by electrophilic aromatic substitution. 
Because there are only 3-4 tyrosines for every 100 amino 
acids in a typical protein,” only about a third of the tryp- 
tic peptides become radioactive, and an autoradiogram* 
of the map is less cluttered than the entire map itself, but 
just as unique to the particular polypeptide. Such a map 
of tyrosine-containing chymotryptic peptides was used 
to show that the a polypeptides of Na*/K*-exchanging 
ATPase from liver and kidney, respectively, both 
polypeptides now known to be 1018 amino acids in 
length with 24 mol of tyrosine (mol of polypeptide)", 
were very similar if not identical to each other (Figure 
8-16). Another way of generating a peptide map from 
a long polypeptide is to digest the complex between it 
and dodecyl sulfate in a solution of dodecyl sulfate with 
an endopeptidase.'” Under these conditions, the diges- 
tion is severely incomplete because the endopeptidase is 
rapidly inactivated by the dodecyl sulfate. Nevertheless, 
a reproducible set of large fragments of the polypeptide, 
characteristic of both it and the specificity of the 
endopeptidase used, is produced, and when these large 
fragments are separated by electrophoresis in the pres- 
ence of dodecyl sulfate, the pattern of bands on the gel is 
a fingerprint unique to that polypeptide. 


* An autoradiogram is a photographic image of the map on which 
only radioactive components are registered. It displays the distri- 
bution of radioactivity over the field. 
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Peptide mapping is sometimes performed by 
adsorption chromatography (Figure 3-7) because these 
separations are more rapidly accomplished,'*' but this 
procedure is not so informative because it involves a sep- 
aration in only one dimension. It is also possible to sep- 
arate tryptic peptides from the digest of a polypeptide in 
one dimension by mass spectrometry following matrix- 
assisted laser desorption." The peptide map that 
results has the disadvantage that often only a portion of 
the tryptic peptides is present rather than a complete set 
so it functions mainly as a fingerprint of the particular 
polypeptide. Such a map, however has the advantages 


Figure 8-16: Peptide maps of tyrosine-containing chymotryptic 
peptides from the o polypeptide of Na*/K*-exchanging ATpase 7 
After Na‘/K*-exchanging ATPase was purified from rat liver or rat 
kidney by immunoadsorption, its œ polypeptide was isolated by 
electrophoresis on polyacrylamide gels in solutions of dodecyl sul- 
fate. Each of the respective purified polypeptides was then chemi- 
cally modified at its tyrosines by electrophilic aromatic 
substitution with "I. The radioactive polypeptides were then 
digested separately with chymotrypsin, and the digests were sepa- 
rated in two dimensions on thin layers of cellulose. Electrophoresis 
was performed from right to left followed by ascending chro- 
matography with butanol/pyridine/water/acetic acid (65:50:40:10 
v/v/v/v) from bottom to top. Peptides containing o-[""I]iodotyro- 
sine were identified by placing photographic film over the chro- 
matogram. The images are those of the developed films. (A) Map 
from o polypeptide of kidney; (B) map from o polypeptide of liver. 
Adapted with permission from ref 129. Copyright 1986 American 
Chemical Society. 
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that the resolution is high so one dimension is sufficient, 
that the molecular mass of each peptide appearing on it 
is registered, and that the peptides can subsequently be 
sequenced (Figure 3-8). If the amino acid sequences of 
two closely related proteins are known, mass spectro- 
metric peptide maps can often tell a sample of one from 
a sample of the other.'” Another advantage of a mass 
spectroscopic peptide map is that it can be performed on 
a small amount of protein (1 pmol). 

If two polypeptides yield peptide maps similar 
enough that they are judged to be related, the percentage 
of identity between their two sequences will be high; if 
they yield peptide maps that cannot be regarded as sim- 
ilar, they may still have clearly homologous sequences. In 
two-dimensional mapping of either all the peptides in a 
digest or just the tyrosine-containing peptides, or in pep- 
tide mapping by adsorption chromatography or mass 
spectrometry, the evaluation of the similarity of two 
polypeptides is based on comparison of the two patterns 
in which the peptides are displayed (Figure 8-15). Only if 
a significant fraction of the peptides on the two maps 
have the same relative positions on the field and produce 
a pattern that can be recognized is the judgment made 
that the two polypeptides are similar to each other. This 
implicit criterion of similarity requires that a significant 
fraction of the respective peptides be identical to each 
other in sequence for the decision to be made that the two 
polypeptides are related. One difference in the sequence 
of two otherwise identical peptides is usually sufficient to 
cause them to have different mobilities (Figure 8-15). 
Because the mean tryptic peptide is eight amino acids in 
length, differences between the sequences of two 
polypeptides at more than 20% of the positions will cause 
the two maps to be completely different, even though the 
two polypeptides are similar enough to be unambigu- 
ously judged homologous in amino acid sequence. Each 
of the four polypeptides of nicotinic acetylcholine recep- 
tor, although they are all homologous in sequence to each 
other (averaging 40% identity), yields a completely dif- 
ferent peptide map.'”° The order in which the three ways 
of detecting homologies among polypeptides fail as the 
percentage of identity becomes smaller is peptide map- 
ping before alignment of amino acid sequences and 
alignment of amino acid sequences before superposition 
of tertiary structures. 

Whenever two or more polypeptides appear upon 
electrophoresis of a purified protein in the presence of 
dodecyl sulfate, the possibility that the smaller polypep- 
tide or polypeptides are fragments of the largest should 
be examined. Such a relationship can be established by 
peptide mapping. The protein ankyrin from human ery- 
throcytes is present in the cell under physiological con- 
ditions as the complete polypeptide and three 
progressively smaller fragments of that polypeptide. That 
these four polypeptides represent such a nested set 
derived by digestion of the largest by cellular endopepti- 
dases could be demonstrated by producing peptide 


maps of each If This was done by separating these 
polypeptides on polyacrylamide gels in solutions of 
sodium dodecyl sulfate, iodinating their tyrosines, 
digesting them with trypsin, and producing tryptic pep- 
tide maps. These peptide maps all displayed the same 
pattern, but the maps from the smaller polypeptides 
lacked one or two of the peptides present in those from 
the next larger one. When collagen type XIV is isolated 
from epidermis of Macaca fascicularis, the purified pro- 
tein contains two polypeptides that can be separated by 
electrophoresis in the presence of dodecyl sulfate. The 
complexes between each of these two polypeptides and 
dodecyl sulfate were digested separately with glutamyl 
endopeptidase. In the range of lengths less than that of 
the shorter of the two polypeptides, the two patterns of 
fragments that resulted from these partial digestions 
were identical to each other, a result demonstrating that 
the shorter of the two polypeptides was itself a fragment 
of the longer.” 

Peptide mapping can be used to determine whether 
an apparently unique component resolved by elec- 
trophoresis in the presence of dodecyl sulfate represents 
only one polypeptide or two or more polypeptides that 
fortuitously have the same electrophoretic mobility. The 
electrophoretic mobility ofthe complex between dodecyl 
sulfate and the polypeptide in question can be used to 
estimate its length. The mole percent of lysine and argi- 
nine in the protein can be either determined directly by 
total amino acid analysis or estimated from the fact that 
this number is usually about 11 mol %.'” The mole per- 
cent of lysine and arginine and the length of the polypep- 
tide can be used to estimate the number of tryptic 
peptides that should be produced if the component 
observed on the gel does represent only one unique 
polypeptide. If the number of peptides observed on the 
map agrees with this expectation, the component prob- 
ably represents only one unique polypeptide. If there are 
about twice as many spots as expected, it must represent 
two different polypeptides. If the component does repre- 
sent two or more polypeptides, it should be possible to 
separate them chromatographically or electrophoreti- 
cally. Such a separation can usually be performed in 8 M 
urea, a solvent that unfolds polypeptides and separates 
them one from the other but that does not interfere with 
either chromatography by ion exchange or electrophore- 
sis. The two or more separated polypeptides should each 
give unique peptide maps, the sum of which should be 
the peptide map of the original mixture. 

When phosphoglycerate dehydrogenase was satu- 
rated with dodecyl sulfate and submitted to elec- 
trophoresis, one component was observed, the 
electrophoretic mobility of which was that of a polypep- 
tide 360 aa in length. The content of lysine plus arginine 
in the protein was determined by amino acid analysis to 
be 9.6 mol %. A tryptic digest of the protein was sepa- 
rated by cation-exchange chromatography, and each of 
the pools from this first dimension was submitted to 


electrophoresis on paper. This two-dimensional peptide 
map displayed 39-40 major peptides. The content of 
tryptophan in the protein was 1.0 mol %, and four of the 
peptides gave a positive test for tryptophan. If all of the 
polypeptides in this protein are identical, there should 
have been 36 tryptic peptides, four of which should have 
contained tryptophan. The agreement between the 
observed numbers and the expected numbers led to the 
conclusion that phosphoglycerate dehydrogenase was 
composed of identical polypeptides.'*® 

Glutamate-tRNA ligase was submitted to elec- 
trophoresis in the presence of dodecyl sulfate, and a 
single component was observed, the mobility of which 
was that of a polypeptide 500 aa in length. The protein 
had a content of lysine plus arginine of 12 mol %; trypto- 
phan, 1.0 mol %; arginine, 6.3 mol %; and cysteine, 
1.0 mol %. It could be concluded’*’ that the component 
observed upon electrophoresis represented only one 
polypeptide because the tryptic peptide map of the pro- 
tein displayed 55 peptides, 30 of which gave a positive 
test for arginine, five of which gave a positive test for 
tryptophan, and five of which became radioactive after 
the protein was reduced and carboxymethylated with 
(“Cliodoacetic acid (Equation 3-17). 

Upon electrophoresis in dodecyl sulfate, the molyb- 
denum-iron protein that is one of the components of 
nitrogenase gave two bands of stained material of very 
similar and often indistinguishable electrophoretic 
mobility, the apparent lengths of which were 540 amino 
acids. When the protein was reduced, carboxymethylated 
with [!*C]iodoacetic acid, and submitted to amino acid 
analysis, its content of ({'‘C]carboxymethyl)cysteine was 
1.7 mol %. Eleven of the tryptic peptides on a peptide map 
of the reduced and carboxymethylated protein were 
radioactive when nine were expected. Instead of passing 
this off as the result of incomplete digestion or inaccurate 
values for content of cysteine, the investigators pro- 
ceeded to show that when the protein was dissolved in 
urea, to unfold its polypeptides, two polypeptides could 
be isolated by cation-exchange chromatography on (car- 
boxymethyl)cellulose. Both were submitted to reduction, 
carboxymethylation, and peptide mapping. Four of the 
radioactive, cysteine-containing peptides from the map 
of the total protein were found on the map of one of the 
polypeptides, and the other seven radioactive peptides 
from the map of the total protein were found on the map 
of the other polypeptide, and no overlaps occurred 
between the two maps. It could be concluded that the 
molybdenum-iron protein had the subunit stoichiometry 
ap. 

A similar situation arose with methylmalonyl-CoA 
carboxytransferase. When this protein was submitted to 
electrophoresis in the presence of dodecyl sulfate, a 
component was present the apparent length of which 
was 550 amino acids, but under some conditions it would 
split into two bands of equal intensity. It was found that 
the native enzyme could be dissociated at pH 9.0 into two 
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proteins that could be separated from each other by 
molecular exclusion chromatography. Each of these pro- 
teins was composed of polypeptides the apparent 
lengths of which were 550 amino acids.'*' Although their 
complexes with dodecyl sulfate were almost indistin- 
guishable in electrophoretic mobility, the polypeptides 
in these separated proteins produced completely differ- 
ent tryptic peptide maps.’ It was later shown that they 
were polypeptides of 519 and 604 aa in length, respec- 
tively, with unrelated amino acid sequences. 

The use of tryptic peptide mapping to provide evi- 
dence for the homogeneity of the polypeptides in a pro- 
tein relies on the assumption that the trypsin has 
digested the polypeptide completely. This should be 
independently demonstrated. For example, initial tryptic 
digests of the polypeptides composing glucose-6-phos- 
phate isomerase produced only two-thirds to three- 
fourths as many peptides as had been expected from the 
assumption that they were all identical. It was found that 
less base was consumed during the digestion than 
should have been, and this suggested that the digestion 
had been incomplete. When the protein was carbamy- 
lated on all of its lysines and then digested with trypsin, 
the quantity of base consumed during the digestion and 
the number of peptides observed on the map were those 
expected theoretically.“ A more sensitive measure of 
complete tryptic digestion is to compare the total con- 
tent of lysine and arginine in the digest to the amount of 
lysine and arginine released from the peptides in the 
digest when a sample is in turn digested with an appro- 
priate carboxypeptidase. 

Electron transfer flavoprotein is another protein 
composed of two polypeptides the lengths of which are 
very similar. It was originally believed to be a dimer of 
two identical polypeptides.'“* Under certain circum- 
stances, however, two narrowly separated components 
would appear upon electrophoresis in the presence of 
dodecyl sulfate, and these were different enough to be 
separated by preparative electrophoresis. Each of the 
separated polypeptides was cleaved with cyanogen bro- 
mide, and the fragments produced were in turn satu- 
rated with dodecyl sulfate and separated in one 
dimension by electrophoresis in a solution of 0.1% do- 
decyl sulfate in 8 M urea (Figure 8-17).''*'* The maps 
produced in this way from each separated polypeptide 
were unique, and their sum was equal to the map of the 
intact protein. These results established the fact that the 
quaternary structure of the electron transfer flavoprotein 
is «ß and explained the observation that there is only 
1 mol of flavin (1.8 mol of polypeptide)” in the protein. 

Many proteins in addition to the molybdenum-iron 
protein of nitrogenase, methylmalonyl-CoA carboxy- 
transferase, and electron transfer flavoprotein are com- 
posed of two or more different polypeptides. Often 
there is a functional basis for this arrangement. Many 
multienzyme complexes, rather than being composed of 
a string of enzymatic domains each formed from a dif- 
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Figure 8-17: Peptide maps in one dimension performed by elec- 
trophoresis on a polyacrylamide gel of fragments derived from 
cleavage of electron-transferring flavoprotein from porcine liver 
with cyanogen bromide." All proteins were reduced and alkylated 
with iodoacetamide before cleavage. Intact electron transfer flavo- 
protein (upper trace), its o polypeptide (middle trace), or its 
B polypeptide (lower trace), the latter two polypeptides isolated by 
preparative gel electrophoresis, were digested with cyanogen bro- 
mide (25 mM) in 88% formic acid and 0.03% sodium dodecyl sul- 
fate for 24 h at 25 °C. The fragments produced were separated on 
gels cast from 13.4% acrylamide in 0.1% sodium dodecyl sulfate 
and 8M urea!” and stained for protein. The gels were then 
scanned for absorbance as a function of length, and length was 
converted to mobility relative to the mobility of a marker dye. 
Reprinted with permission from ref 145. Copyright 1983 Journal of 
Biological Chemistry. 


ferent region of the same polypeptide, are constructed 
from individual subunits gathered together in a larger 
complex. Functionally there is no distinction between 
these two types of multienzyme complexes because the 
important feature is that the different enzymes are gath- 
ered together, whether they are gathered as domains on 


the same subunit or as different subunits. It has already 
been noted that methylmalonyl-CoA carboxytransferase 
is formed from different subunits. One of its three con- 
stituent polypeptides is folded to produce a protein that 
in isolation'”® can catalyze the reaction 


methylmalonyl-SCoA + biotin == 
propionyl-SCoA + carboxybiotin 
(8-51) 


146 the reaction 


and another, a protein that can catalyze 
carboxybiotin + pyruvate ~= oxaloacetate + biotin 
(8-52) 


The entire enzyme is composed of these two subunits 
and a third subunit formed from a single polypeptide 
bearing covalently attached biotin as a posttranslational 
modification of one of its lysines. The intact multien- 
zyme complex catalyzes the overall reaction 


methylmalonyl-SCoA + pyruvate — 
propionyl-SCoA + oxaloacetate 
(8-53) 


In the case of the molybdenum-iron protein of 
nitrogenase, each of the two polypeptides of almost the 
same length (491 and 522 aa) forms one of its subunits. 
One of the two subunits contains the molybdenum and 
some of the iron, while the other subunit contains 
iron-sulfur clusters of a ferredoxin type. This assigns dif- 
ferent functional roles to each of the subunits and 
explains the stoichiometries of the molybdenum and 
iron found in the intact protein. 

Some of the proteins, however, thought to be com- 
posed of two different types of subunits, because they are 
isolated as complexes containing two different polypep- 
tides, are actually the products of a posttranslational 
cleavage, either intentional or artifactual, of what was 
initially a single polypeptide. The internal modification 
of histidine decarboxylase from Lactobacillus producing 
the N-pyruvyl amino terminus (Equation 3-9) coinciden- 
tally produces two shorter polypeptides of lengths 81 and 
229 amino acids from the originally intact precur- 
sor.“”“8 Before the modification occurs, the protein is 
constructed from six identical copies of the intact 
polypeptide, each folded as an independent subunit. 
Many endopeptidases are normally stored safely as inac- 
tive precursors that are activated at the proper time by an 
internal cleavage or cleavages catalyzed by another mol- 
ecule of endopeptidase. These cleavages occur in surface 
loops and have little effect on the overall structure of the 
protein, and the two resulting fragments are not separate 
subunits or even separate domains.'“’ Another example 
of a posttranslational cleavage that produces two frag- 
ments of a constituent polypeptide occurs during the 
maturation of insulin receptor.'”° 


It was once believed that vertebrate acetyl-CoA 
carboxylase was constructed from two or three different 
polypeptides, each present in one or two copies.’ This 
belief was reinforced by the fact that the enzyme from 
E. coli is a multienzyme complex constructed from three 
different subunits,’ albeit folded polypeptides of 
lengths much shorter than the polypeptides found in 
the enzyme from vertebrates. When the vertebrate 
enzyme was purified by a rapid procedure employing 
affinity adsorption, however, it was found to be com- 
posed from only one polypeptide, the length of which 
(2345 amino acids) was more than twice the individual 
lengths of the separate polypeptides seen previously.'” 
Only two folded copies of this longer polypeptide are 
present as subunits of the native protein. It was con- 
cluded that the smaller polypeptides seen previously 
were the products of artifactual digestion by endopepti- 
dases. 

The distinction between a protein that contains 
two different polypeptides because of a posttransla- 
tional cleavage and one that has been assembled from 
two separately translated polypeptides is significant. 
The posttranslational cleavage of a protein usually 
occurs after the entire polypeptide has folded because it 
is only in the folded polypeptide that the cleavage can 
be directed to a precise location. Although significant 
changes may occur in the vicinity of the cleavage, the 
overall structure of the protein remains what it was 
before the cleavage. For all intents and purposes, a 
polypeptide cleaved either naturally or artifactually after 
it has folded remains structurally a folded single 
polypeptide. On the other hand, when a protein is 
assembled from separately translated polypeptides, they 
first must fold independently before they can recognize 
each other and join together. Each subunit begins as 
and remains as a discrete entity and appears as such 
when a crystallographic molecular model of the protein 
is viewed. Yet when the polypeptides are resolved on a 
polyacrylamide gel, these two very different situations 
cannot be distinguished, and as with the confusion sur- 
rounding the connotations of the term domain, the urge 
arises to confer the structural integrities of separate sub- 
units to a protein that may contain two or more 
polypeptides only by virtue of posttranslational modifi- 
cation. 

It is possible that the number of proteins assem- 
bled from two or more different subunits containing 
polypeptides with different sequences and different 
lengths has been overestimated. Even if this is not so, 
such heterooligomers represent only a minority of the 
oligomeric proteins. Most oligomeric proteins are con- 
structed from only one polypeptide that is present in 
two or more identical copies in the complete protein. 
The length of that polypeptide is estimated by elec- 
trophoresis in the presence of dodecyl sulfate. The 
number of copies present in the intact protein is 
counted. 
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Problem 8-13: Glucose-6-phosphate isomerase cat- 
alyzes the conversion 


D-glucose 6-phosphate ==> D-fructose 6-phosphate 
(8-51) 


The enzyme has been purified from muscle of O. cunicu- 
lus. When it was dissolved in a solution of sodium do- 
decyl sulfate and submitted to electrophoresis, the 
following results were obtained "H 


protein Naa Rg 

ovalbumin 385 0.42 
catalase 506 0.32 
serum albumin 583 0.26 
glucose-6-phosphate isomerase 0.30 


The enzyme was unfolded in 6 M guanidinium chloride 
and modified with potassium cyanate, the reagents were 
removed by dialysis, and the protein was digested with 
trypsin. The following two maps were made of this tryp- 
tic digest. The only difference in these two maps is the pH 
at which the electrophoresis was performed. 

The amino acid composition of the protein has 
been determined: 


moles moles 
amino (100,000 g amino (100,000 g 

acid of protein)! acid of protein)! 

K 62 I 53 

H 36 L 87 

R 32 Y 19 
DAN 88 F 46 

T 62 G 67 

S 58 A 64 
E+Q 96 V 52 

P 38 M 22 


(A) What is the length of the polypeptides composing 
glucose-6-phosphate isomerase? 


(B) How many different polypeptides does the pro- 
tein contain? 


(C) Which amino acid side chain are you certain the 
cyanate modified? Why? 


(D) What conclusions did you draw from the peptide 
maps? 


Problem 8-14: The molar mass of 2-dehydro-3-deoxy- 
phosphogluconate aldolase (DDPG aldolase) from 
Pseudomonas putida has been estimated as 73,000 g 
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mol” by sedimentation equilibrium. The enzyme was 
dissolved in a solution of sodium dodecyl sulfate and 
submitted to electrophoresis. Its mobility on the gel as 
well as the mobilities of several standards are tabu- 
lated: 


protein Naa Rg 

carbonate dehydratase 260 0.62 
chymotrypsinogen 245 0.74 
trypsin 223 0.76 
myoglobin 153 1.00 
DDPG aldolase 0.75 


The protein was reduced, carboxymethylated, and 
digested with trypsin. There were 24 spots observed on a 
two-dimensional map of this tryptic digest; three were 
positive for tyrosine, as detected by Pauli stain, and 13 
were positive for arginine. 

The amino acid composition of the protein was deter- 
mined: 


© E 
Ss 
E e 
LO 
. a Ta 
a t o 
N 
o 
4 
° 
6 
S 
z ® 
© 
m & 
eo 
o e 
S 

c e ke 

> 

S S 

O V 2 

S 
— SN 
Chromatography 


amino mol (100 amino mol (100 amino mol (100 
acid mol)! acid mol)! acid mol)?! 
ES 1.7 G 9.2 Y 1.3 

D+N 7.4 A 13.6 F 3.2 
T? 4.5 V 6.5 K 3.1 

eh 3.6 M 3.0 H 0.5 

E+Q 8.9 I 8.2 R 6.6 
P 6.9 L 9.5 w° 1.6 


“As (carboxymethyl)cysteine at 24 h. PExtrapolated to zero hour. 
“Determined spectroscopically. 


The protein was dissolved in 8M urea, 2-mercap- 
toethanol was added, and the mixture was incubated for 
4h. The reduced protein was then alkylated with 
(“Cliodoacetate, dialyzed, and digested with trypsin. The 
tryptic digest was run on an ion-exchange column, and 
four peaks of radioactivity were observed. Each radioac- 
tive peak was further purified to homogeneity. The four 
were shown by the composition of their amino acids to 
be unique and each contained one (carboxymethyl) cys- 
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Tryptic peptide maps of carbamylated glucose-6-phosphate isomerase from muscle of O. cuniculus." Electrophoresis: right map, 
pyridine/acetic acid/water (520:1.4:1000 by volume), pH 6.2; left map, pyridine/acetic acid/water (7:66:1927 by volume), pH 3.5. 
Chromatography: descending chromatography with butanol/acetic acid/pyridine/water (15:10:3:12 by volume). Maps were performed on 
sheets (46 cm x 57 cm) of chromatographic paper, and peptides were located by ninhydrin. 


teine. Their compositions were C,D;S1EzAglL;, 
C,D,E,A)1.K1, C,D2T,GoA;VoF;R), and 
C,D,T,E,G,A;V,L,R). 


(A) What are the lengths of the polypeptides compos- 
ing this enzyme? 


(B) How many polypeptides are there in the protein? 


(C) How many different types of polypeptides are 
there in the protein? 


(D) What conclusions can you draw from the tryptic 
peptide map? 


(E) What conclusions can you draw from the tryptic 
peptides containing ({*C]carboxymethyl) cys- 
teine? 


Problem 8-15: A protein has been purified to homo- 
geneity. It has the following properties. "° 

When the protein is reduced with 2-mercap- 
toethanol and run on a sodium dodecyl sulfate gel in the 
presence of standards, the following results are obtained: 


protein Naa mobility 
B-lactoglobulin 162 0.70 
myoglobin 153 0.73 
lysozyme 129 0.81 
ribonuclease 124 0.82 
cytochrome c 104 0.87 
protein X 0.77 


The amino acid composition of the protein is as follows: 


amino acid mol (100 mol)? amino acid mol (100 mol)? 


G 6.9 Y 2.0 
A 12.5 WwW 0.9 
S 5.4 C 1.0 
T 5.4 M 1.0 
P 5.0 D 8.6 
V 10.7 E 5.9 
I 0.0 R 2.9 
L 12.6 H 6.5 
F 5.2 K 7.6 


The protein was digested with trypsin, and a pep- 
tide map was prepared. It contained 26 well-defined pep- 
tides. 

The peptide map was stained for various amino 
acid side chains: five peptides were positive for arginine, 
13 peptides were positive for histidine, four peptides 
were positive for methionine, three peptides were posi- 
tive for tryptophan, and seven peptides were positive for 
tyrosine. 

The protein was carboxymethylated with 
(“Cliodoacetic acid and digested with trypsin. Three 
tryptic peptides containing radioactive (car- 
boxymethyl)cysteine were isolated by ion-exchange 
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chromatography, and each of these was shown to have a 
unique composition of amino acids and to contain 1 mol 
of Cys (mol of peptide)". 


(A) What is the length of a polypeptide composing 
this protein? 


(B) How many different types of polypeptides com- 
pose the protein? 


(C) Explain the peptide maps. 


Cross-Linking 


There arose a disagreement over the number of subunits 
contained in fructose-bisphosphate aldolase. The physi- 
cal methods for estimating the molar mass of the native 
protein and the molar mass of its constituent polypep- 
tides were unable to decide between three and four. It 
should be noted that everyone had an equal chance of 
being correct, so the point is not who turned out to be 
right but that the question could not be resolved simply 
by arguing over the numbers. What was needed instead 
was a different kind of experiment, and it was provided. 
When the fructose-bisphosphate aldolase in a 
homogenate from brain of O. cuniculus was submitted to 
electrophoresis in its native state, five evenly spaced com- 
ponents displaying enzymatic activity were observed 
(Figure 8-18).’° Penhoet, Kochman, Valentine, and 
Rutter” decided that this must be due to the fact that, in 
the brain, two isoenzymatic polypeptides designated a 
and yare translated from two different messenger RNAs 
continuously and coincidentally. These polypeptides fold 
separately to form monomeric subunits that then com- 
bine at random with subunits of their own kind or of the 
other isoenzymatic type to produce hybrids of the stoi- 
chiometries 0%, 037, 0%, &%, and y,, designated A, I, II, III, 
and C in Figure 8-18.* The two different subunits, œ and 
y, differ in the sequences of their polypeptides and hence 
in their charge. Each hybrid in turn has a different elec- 
trophoretic mobility because each has a different mean 
charge number Z. The hybrids are capable of forming in 
the first place because the two different polypeptides are 
homologous in their sequences, have superposable terti- 
ary structures in their folded state, share a common 
ancestor, and have not diverged sufficiently from that 
common ancestor to have lost the ability to combine with 
each other in the same way that they are required to do 
with subunits identical to themselves. If this explanation 
is correct, the number of subunits in any molecule of 
aldolase can be determined by simply counting the com- 
ponents on the electrophoretic separation. There must be 
four. A similar hybridization was used to verify that chlo- 
ramphenicol O-acetyltransferase is a trimer.” 


* The proteins with quaternary structures o: and y, have been des- 
ignated the A isoform and the C isoform, respectively, of fructose- 
bisphosphate aldolase. 
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Figure 8-18: Dissociation and reassociation of oligomeric hybrids 
of fructose-bisphosphate aldolase.” A clarified homogenate from 
brain of O. cuniculus was submitted to electrophoresis on cellulose 
acetate at pH 8.6, and the strip was then stained for the activity of 
fructose-bisphosphate aldolase. Five evenly spaced bands of enzy- 
matic activity were observed (top pattern). On the basis of their cal- 
culated points of zero net charge, the most anionic component, 
labelled C in the figure, could be identified as isoform C of fructose- 
bisphosphate aldolase; and the most cationic, labelled A in the 
figure, as isoformA of fructose-bisphosphate aldolase. The 
homogenate was then submitted to substrate elution from cellu- 
lose phosphate followed by anion-exchange chromatography on 
(diethylaminoethyl)cellulose (Figure 1-2). The five components 
that differed in electrophoretic mobility could be identified on 
these chromatograms by their enzymatic activity and could be 
cleanly separated from each other in this way. Each was submitted 
to electrophoresis separately at pH 8.6 and only one respective 
component with enzymatic activity was found in each (the five pat- 
terns on the lower left). Each of these single components was then 
exposed to 0.33 M H3PO, at pH 2.0 for 30 min at 0°C and then 
brought back to pH 7.5. The low pH served to dissociate the sub- 
units of each hybrid, and the neutralization reassociated them but 
at random. Each of the dissociated and reassociated mixtures of 
hybrids was then resubmitted to electrophoresis, and the strips of 
cellulose acetate were stained for enzymatic activity (the five 
patterns on the right). Reprinted with permission from ref 76. 
Copyright 1967 American Chemical Society. 


In the experiments with fructose-bisphosphate 
aldolase, the investigators proved that the components 
observed upon electrophoresis were the hybrids that 
they proposed by isolating each of them, dissociating 
each to subunits, reassociating the subunits, and demon- 
strating that each of these reassociated samples con- 
tained a new set of hybrids consistent with the random 
rearrangement of the stoichiometry assigned to the 
parent (Figure 8-18). For example, a, gave back only o. 
and y, gave back only %, but oa gave all five hybrids in 


approximately binomial ratios. It was also shown that an 
equal mixture of a, and y, when dissociated and reasso- 
ciated reproduced all five hybrids. 

It is possible to increase the difference in charge 
between two isoenzymatic polypeptides, and hence the 
spacing between the bands on electrophoresis, or to pro- 
duce a charge variant ofa single polypeptide by mutating 
lysines to glutamates.’»° It is also possible to make elec- 
trophoretic variants by simply succinylating any native 
protein of interest,'” rather than relying on isoforms 
coincidentally provided by nature. The pattern of func- 
tional heterogeneity, rather than the pattern of elec- 
trophoretic heterogeneity, of a hybrid mixture formed 
from subunits with differences in function, rather than 
differences in charge, can also be used to count the sub- 
units in an oligomer.” Nevertheless, the number of sub- 
units in a native protein is rarely counted by 
hybridization. The point, however, is not that hybridiza- 
tion solves the problem of determining the stoichiome- 
tries of subunits but that an experiment can be designed 
so that the number of subunits can be counted directly 
instead of calculated from physical measurements. This 
new way of looking at the problem stimulated the devel- 
opment of a technique that could be used to count the 
number of subunits in most soluble proteins. This tech- 
nique relies upon cross-linking. 

A reagent capable of covalently cross-linking two 
polypeptides is a chemical compound that contains at 
least two electrophilic functional groups attached to 
each other by the remainder of the molecule. Aromatic 
diisocyanates, such as m-xylylene diisocyanate’® 


were among the first such reagents to be used for this 
purpose, but they suffer from problems of low solubility. 
A widely employed cross-linking agent, and the one orig- 
inally used to count subunits,’ is dimethyl suberimi- 
date 


8-2 


By far the most prevalent and accessible nucleophiles 
on the surface of a molecule of protein are the primary 
amines of its lysines, and the majority of the reagents, 
such as the diisocyanates and the diimidoesters, are 
directed to lysines. 


There have been a large array of reagents synthe- 
sized over the years for cross-linking proteins. They have 
been designed to react with a broader array of amino 
acids than just lysine. They have also been designed to 
form cross-links that can be reversed.” A reagent that 
illustrates all of the various elements in the design of 
such cross-linking reagents is sulfosuccinimidyl 
2-(7-azido-4-methylcoumarin-3-acetamido)-ethyl- 
1,3’-dithiopropionate:'°! 
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The ester of N-hydroxysulfosuccinimide is an elec- 
trophile that reacts with lysines. The N-hydroxysulfosuc- 
cinimide is used as a leaving group, rather than 
N-hydroxysuccinimide, to increase solubility. The azide 
is photoactivated to the nitrene that has a broader spec- 
trum of reactivity than a more common electrophile. The 
two electrophiles react respectively with two nucle- 
ophiles on the protein to cross-link them. The disulfide 
can be reductively cleaved to reverse the cross-linking. 
The coumarin makes the reagent and hence the products 
of the cross-linking fluorescent. At the other end of the 
scale of complexity is the cross-linking of two proteins by 
the oxidative formation of covalent dimers between a 
tyrosine on one of the proteins and a tyrosine on the 
other 97 

Cross-linking reagents that contain an activated 
disulfide that reacts efficiently with cysteines on the 
surface of a protein are also widely used. A mixed 
disulfide with 2-thiopyridine, an excellent leaving 
group, can be used to attach an electrophile to a cysteine 
on the first protein. For example, N-succinimidyl 
3-(2-pyridyldithio)propionate’™ 


O 
O 
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reacts by disulfide interchange (Figure 3-20) with the 
thiol ofa cysteine to form a mixed disulfide between that 
cysteine and the N-succinimidyl 3-thiopropionate.The 
reagent can be directed to a particular location on the 
first protein by inserting a cysteine at a particular posi- 
tion in its sequence by site-directed mutation.! The 
ester of N-hydroxysuccinimide can then react with a 
lysine on a second protein, cross-linking it to the first pro- 
tein. The cross-link is reversible because the mixed disul- 
fide can be reduced by disulfide interchange with a 
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mercaptan such as dithiothreitol (Equation 3-18). This 
reaction generates a thiol at the location on the second 
protein that participated in the cross-link. This 
unmasked thiol can then be used to isolate a peptide from 
the second protein containing the amino acid that was 
cross-linked.’ Because the position in the amino acid 
sequence of the first protein at which the reagent was 
attached was determined by the site-directed mutation 
and the position in the second protein with which the 
electrophile reacted can be identified by isolating a pep- 
tide containing the modified amino acid, the cross-link- 
ing reaction can be used to identify adjacent amino acids 
in the complex that forms between the two proteins. 

The most commonly used cross-linking reagents 
are bisimidates and bisaldehydes. An imidoester such as 
the ones in 8-2 reacts with a primary amine to form an 
amidine: 
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When a molecule of dimethyl suberimidate happens to 
react at one of its ends with a lysine on one subunit and 
at its other end with a lysine from another subunit, those 
two subunits become covalently cross-linked and their 
polypeptides migrate together upon electrophoresis in 
the presence of dodecyl sulfate with the mobility of a 
polypeptide the length of which is equal to the sum of the 
lengths of the two polypeptides so joined. 

The intramolecular cross-linking of an oligomeric 
protein provides a count of the number of subunits it 
contains. At concentrations of protein below 10 uM, no 
significant intermolecular cross-linking between sepa- 
rate molecules of protein occurs when a solution of pro- 
tein is mixed with a cross-linking agent such as dimethyl 
suberimidate. Instead, what is observed are the products 
that result from intramolecular cross-linking among the 
fixed number of subunits of which the protein is com- 
posed.'*? When the products of such intramolecular 
cross-linking are separated on a polyacrylamide gel, a 
ladder of bands, vaguely reminiscent of the ladders seen 
with randomly cleaved nucleic acids, is observed (Figure 


442 Counting Polypeptides 


8-19).'® These ladders, however, end abruptly because 
no more polypeptides can be cross-linked than are pres- 
ent in the complete protein. A count of the rungs in the 
ladder provides the number of subunits in the protein. In 
the case of glycerol kinase, the protein used for the analy- 
ses presented in Figure 8-19, it could be concluded that 
it is composed of four and only four subunits.’® Peptide 
maps showed that all four of the polypeptides compos- 
ing the subunits are identical to each other.’ 

There are three reassurances that should be pro- 
vided. First, the relative mobility of each of the bands in 
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Figure 8-19: Definition of the number of subunits in glycerol 
kinase.'® Three samples of a solution of glycerol kinase were mixed 
with final concentrations of 10 mg mL” dimethyl suberimidate 
(inset, gel C), 0.25 mg mL! dimethyl suberimidate (inset, gel B), or 
no additions (inset, gel A). After 4 h at pH 8.5, the samples were dis- 
solved in a solution of sodium dodecyl sulfate and subjected to 
electrophoresis on polyacrylamide gels cast in 0.1% sodium do- 
decyl sulfate. The gels were stained for protein. The irregular line at 
the bottom of each gel is a line of India ink used to mark the posi- 
tion of the marker dye at the end of the electrophoresis. The band 
of stain just above this mark on gels A and B is the un-cross-linked 
polypeptide of glycerol kinase. The mobilities of the four compo- 
nents on gel B relative to the marker dye were calculated from the 
photograph, and the negative natural logarithm of each of these 
mobilities (In R) is plotted against a scale of successive integers. 
Adapted with permission from ref 165. Copyright 1971 Journal of 
Biological Chemistry. 


the ladder should be shown to be a regular function of its 
number (graph in Figure 8-19).' This is a reassurance 
that one of the members of the set is not missing from the 
pattern. Second, the cross-linking reaction should be 
forced to completion so that only the highest oligomer is 
seen (gel C, inset to Figure 8-19). This result provides the 
reassurance that this oligomer does represent a true limit 
to the reaction and that the solution contains only one 
unique multimer of the subunits rather than a mixture of 
multimers each containing a different number of sub- 
units. Third, the possibility that intermolecular cross- 
linking is occurring should be ruled out by varying the 
concentration of protein. The amounts of the compo- 
nents arising from intramolecular cross-linking will not 
vary with the concentration of protein, but those that 
arise from intermolecular cross-linking will. Random 
intermolecular cross-linking also yields a distribution 
that gradually declines in its amplitude with band 
number rather than displaying an abrupt limit at a cer- 
tain unique polymer size. It is this discontinuous behav- 
ior that is the logical basis for believing the results "7 

It is in fulfilling the requirement that the reaction be 
forced to completion that dimethyl suberimidate usually 
fails. The reaction of an imidoester with a primary amine in 
aqueous solution is a competition between formation of the 
desired amidine and hydrolysis of the imidoester to amide 
and ester: 
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There is no way to avoid this competition by some 
informed adjustment of the conditions; it can only be 
made worse. It complicates the reaction because when 
one end of the diimidoester has attached to a lysine, 
there is a significant probability that its other end has 
either already been hydrolyzed or will hydrolyze before it 
can react with another lysine. A high percentage of the 
lysines end up with defunct reagent after all of the lysines 
have become amidines and no further reaction can occur 


regardless of how much reagent is added. Because of this, 
it is often the case that not all of the subunits can be 
cross-linked among themselves. The logical chemical 
solution to this problem would be to use an electrophile 
at the two ends of the cross-linking reagent that is more 
selective for a primary amine relative to a hydroxide 
anion, but this has not been explored systematically. 
Rather, it has been inadvertently accomplished. 

The most versatile cross-linking agent is glu- 
taraldehyde: 
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Unfortunately, the chemistry of its reactions with pro- 
teins has never been elucidated. Presumably as an 
aliphatic aldehyde it engages in all of the same chemical 
reactions with lysine that occur after aliphatic aldehydes 
are produced in collagen by protein-lysine 6-oxidase 
(Figure 3-19). It is part of the lore surrounding glu- 
taraldehyde that the freshly distilled reagent is far less 
active; this is meant to suggest that compounds derived 
from glutaraldehyde itself, such as its aldol and dehy- 
drated aldol, are important to the cross-linking. The 
dehydrated aldol as well as many of the products result- 
ing from the imine formed upon the reaction of glu- 
taraldehyde with a lysine are off unsaturated aldehydes 
or imines. It is the greater preference of these of unsatu- 
rated aldehydes and imines for reaction with the weak 
base lysine rather than the strong base hydroxide that 
produces a higher yield of cross-linked product when glu- 
taraldehyde is used. In situations when the cross-linking 
reaction with dimethyl suberimidate cannot be forced to 
completion, cross-linking with glutaraldehyde will usu- 
ally produce the fully cross-linked protein in high yield. 
The efficiency of glutaraldehyde has permitted it to be 
used to provide a quantitative catalogue of the various 
oligomeric complexes present in a heterodisperse solu- 
tion of a single, pure protein. 

Quantitative cross-linking is cross-linking carried 
to an extent sufficient to connect covalently every sub- 
unit in a macromolecular complex to every other subunit 
in the same macromolecular complex, either directly or 
indirectly, but not to any subunit in another macromol- 
ecular complex. In order for glutaraldehyde to perform 
quantitative cross-linking, the formation of intermolecu- 
lar covalent connections between independent, unasso- 
ciated oligomeric complexes in the solution must be 
negligible because every covalent complex must repre- 
sent only the product of intramolecular cross-linking, 
and every multimeric complex in the solution must be 
completely cross-linked within itself so that every one of 
its constituent subunits is covalently attached to all of the 
others. Cross-linking with appropriately high concentra- 
tions of glutaraldehyde often fulfills these require- 
ament 719 
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This fulfillment can be illustrated with two experi- 
ments. In the first experiment, a monodisperse solution 
containing a homogeneous population of the oligomeric 
protein L-lactate dehydrogenase, a protein known to be a 
tetramer composed of four identical subunits, was mixed 
with glutaraldehyde.’ The reaction was permitted to pro- 
ceed a short period of time, and the products were exam- 
ined by electrophoresis on polyacrylamide gels in the 
presence of dodecyl sulfate. At high enough concentra- 
tions of glutaraldehyde, the only component that was 
observed contained the number of polypeptides, four, 
known to be present in the protein, all covalently con- 
nected to each other (Figure 8-20).' No larger products, 
which would have resulted from intermolecular cross- 
linking, and no smaller products, which would have 
resulted from incomplete intramolecular cross-linking, 
were observed. In the second experiment, the receptor for 
epidermal growth factor was cross-linked. This protein 
forms an œ dimer from two o monomers upon the addi- 
tion of epidermal growth factor to the solution. Before the 
epidermal growth factor was added, only the monomer of 
the receptor was seen on the polyacrylamide gel after the 
protein had been cross-linked with glutaraldehyde and 
then dissolved in a solution of dodecyl sulfate. When epi- 
dermal growth factor was added and samples were 
removed at different times and then cross-linked, 
unfolded in a solution of dodecyl sulfate, and submitted 
to electrophoresis, the un-cross-linked monomer was 
seen to be gradually but completely replaced by the cova- 
lent cross-linked dimer.'”’ If any intermolecular cross- 
linking had been occurring after the glutaraldehyde was 
added, covalent dimer should have been observed in the 
absence of epidermal growth factor, but it was not. If 
incomplete intramolecular cross-linking were occurring 
after the glutaraldehyde had been added, some un-cross- 
linked monomer should have remained after completion 
of the dimerization, but it did not. 

There are two possible problems that can affect the 
outcome of quantitative cross-linking. First, if the protein 
is cross-linked at a concentration that is significantly lower 
than its physiological concentration, a naturally occurring 
oligomer may dissociate; or if it is cross-linked at a con- 
centration significantly higher than its physiological con- 
centration, artifactual aggregation may occur. Second, if 
the electrophilic cross-linking reagent reacts with nucle- 
ophilic side chains on the protein that are involved in its 
function and if its intact function is required for mainte- 
nance ofits proper oligomerization, the protein could dis- 
sociate or artifactually associate because of its chemical 
inactivation before it becomes cross-linked. For example, 
glutaraldehyde inactivates the ATPase of the chaperone 
protein GroEL within 2 s ofits application, and the ATPase 
activity is required to maintain the correct oligomeriza- 
tion of the protein. Controls were devised to demonstrate 
that the oligomeric state of the protein nevertheless did 
not change in the time required for quantitative cross-link- 
ing to be accomplished.’ 
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Figure 8-20: Cross-linking of L-lactate dehydrogenase.'*’ Samples 
of L-lactate dehydrogenase were submitted to cross-linking for 
2 min with 0.4 M glutaraldehyde at 20 °C and the reaction was ter- 
minated by adding sodium dodecyl sulfate to 20 uM. The samples 
were then submitted to electrophoresis on polyacrylamide gels, the 
gels were stained for protein, and the absorbance at 560 nm as a 
function of the distance migrated is presented. (A) Cross-linking in 
a solution of native tetramers (20 nM). Equivalent samples were 
either cross-linked (solid line) or not cross-linked (dashed line). In 
the former, only fully cross-linked tetramers (T) are observed; in 
the latter, only un-cross-linked monomers (M). (B) Cross-linking of 
reassembling L-lactate dehydrogenase after it had been dissociated 
into subunits. L-Lactate dehydrogenase was dissociated into sub- 
units at pH 2.3 and then diluted into a solution at pH 7.6 (final con- 
centration of subunits 340 nM). After 210 s (dashed line) and Ih 
(solid line), samples were removed and complexes (M, monomer; 
D, dimer; T, tetramer) were catalogued by cross-linking them for 
2 min with 0.4 M glutaraldehyde. Reprinted with permission from 
ref 167. Copyright 1979 Nature. 


If all of these requirements have been satisfied, 
quantitative cross-linking can be used to define the qua- 
ternary structure of a protein. For example, it has been 
used to show that 1-aminocyclopropane-1-carboxylate 
synthase is a dimer of two identical subunits.” The 


(O7)2 tetradecamer of GroEL, the Lola, heterooligomer 
of GroEL and GroES, and the Bio, heterooligomer of 
GroEL and GroES are each also quantitatively cross- 
linked by glutaraldehyde.'” 

While the preceding examples illustrate how glu- 
taraldehyde can be used to define the oligomer present 
in a monodisperse solution of a protein, it can also be 
used to provide a catalogue of the oligomers in a het- 
erodisperse solution. When Na‘/K*-exchanging ATPase 
is dissolved in a solution of nonionic detergent, a mixture 
is formed of various oligomers, (@B),, which are combi- 
nations of the two different subunits, o and ß, from 
which this protein is composed. The fraction of the total 
protein engaged in each of these oligomers, respectively, 
could be rapidly and accurately determined by quantita- 
tive cross-linking (Figure 8-21).’' From the pattern 
displayed in the figure, it could be calculated that before 
the glutaraldehyde had been added to the solution, the 
fractions of the protein present as the (a), tetramer, the 
(œp); trimer, the (off, dimer, and the off monomer were 
0.25, 0.19, 0.28, and 0.31, respectively. 

As was done with the receptor for epidermal growth 
factor, cross-linking with glutaraldehyde can also be 
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Figure 8-21: Cross-linking of the oligomers in a heterodisperse 
solution of Na*/K*-exchanging ATPase.!® A suspension of biologi- 
cal membranes containing only Na*/K*-exchanging ATPase was 
dissolved in a 5 mM solution of the nonionic detergent octa(ethyl- 
ene glycol) dodecyl ether. After the solution was clarified by cen- 
trifugation, glutaraldehyde was added to 8 mM. After 45 min at 
22 °C, sodium dodecyl sulfate was added, and the sample was sub- 
mitted to electrophoresis on a gel cast from 3.6% acrylamide and 
0.1% dodecyl sulfate. After the gel was stained, it was scanned for 
absorbance at 550 nm as a function of the distance migrated. By 
their mobilities the components were identified as covalently 
cross-linked of monomer (M), (aß), dimer (D), (œp); trimer (T), 
and off, tetramer (T) of the enzyme, which in its native form is a 
monomer (aß) of one o subunit and one D subunit in noncovalent 
association." Because the enzyme is a complex of an o subunit 
and a D subunit, almost no un-cross-linked o polypeptide (a) or 
D polypeptide (ß) remained following the cross-linking. Reprinted 
with permission from ref 169. Copyright 1982 American Chemical 
Society. 


used to follow the kinetics of the assembly of a protein. 
When L-lactate dehydrogenase is transferred to a solu- 
tion at pH 2.3, it dissociates into monomers consisting of 
single subunits, and when it is transferred back to pH 7.6, 
it reassociates over an hour to its normal tetrameric state. 
The concentration of monomer, dimer, and tetramer can 
be ascertained at any given minute by removing a sample 
and cross-linking it with glutaraldehyde (Figure 
8-20B).'*’ An extensive study of the kinetics of this step- 
wise association could be performed in this way.'® The 
rate and mechanism of the association of the subunits of 
the catalytic trimer of aspartate carbamoyltransferase 
were also monitored by quantitative cross-linking,’ as 
well as the conversion of the (a). tetradecamer of GroEL 
into the (oe, and ß-(0,)>ß- heterooligomers of GroEL 
and GroES caused by MgATP.'”? 

Cross-linking has also been used to determine the 
stoichiometric ratio between two dissimilar subunits. 
For example, the fact that the œ and ß polypeptides of 
succinate-CoA ligase (ADP-forming) from E coli disap- 
pear in concert almost entirely during the formation of a 
covalent af heterodimer and covalent (oft, heterote- 
tramer (Figure 8-22)'” means that they must be present 
in equimolar ratio in the protein. A similar result was 
observed with the two o and D polypeptides composing 
Na IK eschanging ATPase.'”° In either of these cases, if 
the polypeptides had not been present in equimolar 
ratio, either one polypeptide would have disappeared 
from its position on the polyacrylamide gel more rapidly 
than the other during the formation of a covalent of het- 
erodimer or significant amounts of covalent products of 
the form &ß or of: would have appeared in addition to 
the covalent off heterodimer. 

The patterns seen in the ladders from partially 
cross-linked samples of a protein (Figure 8-19) can pro- 
vide information about the arrangement of the subunits 
in the oligomer. Succinate-CoA ligase (ADP-forming) 
from E. coli is a protein composed of two different sub- 
units, 388 and 288 amino acids in length, each present in 
two copies. These subunits could have been arranged in 
at least two ways to produce stoichiometries of either 
(aß) or OB. The former designation implies that the 
association between an o subunit and a ß subunit is more 
intimate than that between either two o subunits or two 
p subunits; the latter implies the reverse. Examples illus- 
trating this distinction would be either a dimer of two 
identical subunits, each of which was posttranslationally 
cleaved to produce a pair of subunits, each an entwined 
apolypeptide and ß polypeptide, or a heterotetramer 
assembled from a fully folded a dimer and a fully folded 
p dimer, respectively. Succinate-CoA ligase (ADP-form- 
ing) was reacted with enough dimethyl suberimidate to 
cross-link completely the o and f polypeptides among 
themselves but not enough to produce a high yield of the 
completely cross-linked product.'” The covalent of het- 
erodimer was by far the major product of this incomplete 
reaction. A reasonable yield of the covalent heterote- 
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tramer, (@B)2, was also produced during the reaction, but 
little or no covalent trimer, covalent oo homodimer, or 
covalent 88 homodimer was seen (Figure 8-22). 

This result demonstrates that the formation of a 
cross-link between one folded a polypeptide and one 
folded ßpolypeptide in native succinate-CoA ligase 
(ADP-forming) is a far more likely event than the forma- 
tion of a cross-link between either two o polypeptides or 
two ß polypeptides, and it suggests that œ and ß polypep- 
tides are more intimately associated with each other than 
are o polypeptides with o polypeptides or D polypeptides 
with ß polypeptides. This conclusion has been validated 
by the crystallographic molecular model.” Therefore, 
the proper designation for the arrangement of the sub- 
units in the native, un-cross-linked protein is (@ß)». 

The yield of heterotrimer on the gel in Figure 8-22 
is significantly lower than the yield of heterotetramer. A 
similar disparity among the products of a partial cross- 
linking reaction was seen when several tetrameric pro- 
teins, each composed of four identical subunits, were 
examined.’ L-Lactate dehydrogenase, pyruvate kinase, 
fructose-bisphosphate aldolase, fumarate hydratase, 
and catalase were each partially cross-linked with vari- 
ous dimethyl bisimidates. In each case, the yield of the 
covalent trimer was 2-6-fold lower than that of either the 
covalent dimer or the covalent tetramer. This result is 
reminiscent of the one seen with succinate-CoA ligase 
(ADP-forming), and its explanation is the same. Within 
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Figure 8-22: Cross-linking of succinate-CoA ligase (ADP-forming) 
from E coli.‘ A solution of succinate-CoA ligase (1.0 mg mL") was 
cross-linked with dimethyl suberimidate (2.0 mg mL") for 30 min. 
The resulting covalent complexes were dissolved in a solution of 
sodium dodecyl sulfate and submitted to electrophoresis. (A) Un- 
cross-linked control showing the a and ß polypeptides from which 
the enzyme is composed. (B) Cross-linked product. The compo- 
nents observed were assigned as the covalent off heterodimer, the 
covalent heterotrimer, and the covalent (aß), heterotetramer on 
the basis of their apparent lengths (numbers to the left of each gel) 
determined by their mobilities on electrophoresis relative to a set 
of polypeptides of known length. Reprinted with permission from 
ref 175. Copyright 1975 Journal of Biological Chemistry. 
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each of these molecules of protein there must be associ- 
ations between some pairs of œ subunits that are more 
intimate than the associations between other pairs of 
asubunits. The results suggest that the proper designa- 
tion for the arrangement of the subunits in each of these 
proteins is (ol, To understand why this is so, the rules 
governing the evolution of oligomeric proteins must be 
understood. These rules are based on rotational axes of 
symmetry. 
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Problem 8-16: Assume that each pure hybrid of fruc- 
tose-bisphosphate aldolase shown in Figure 8-18’° was 
dissociated completely and reassociated at random 
during the experiment described in the figure. Refer to 
the two types of subunits as o and y. 


(A) What is the respective subunit structure of each of 
the five purified hybrids? 


(B) Assume the subunits reassemble at random by 
the binomial formula af + 4a°c + 6a’c’ + 4ac’ +c’, 
where a is the fraction of the dissociated subunits 
that are wand c is the fraction that are yand pre- 
dict the ratio of components expected from the 
reassociation of each of the five dissociated 
hybrids. 


Problem 8-17: The following pictures are of polyacryl- 
amide gels cast in 0.1% sodium dodecyl sulfate on which 


proteins unfolded with dodecyl sulfate were submitted to 
159 


electrophoresis. 


In all three experiments, the native proteins (at concen- 
trations of 0.03-0.2 mM in subunit) were first treated 
with dimethyl suberimidate (at concentrations of 
3-7 mM) before they were unfolded with the dodecyl sul- 
fate. In each experiment, the upper arrow marks the top 


of the gel, and the lower arrow marks the stained band 
corresponding to the single polypeptide observed when 
treatment with dimethyl suberimidate was omitted. 


(A) By drawing a graph for each of the gels, A, B, and 
C, show that none of the products of the reaction 
between the respective protein and dimethyl 
suberimidate were overlooked. 


(B) What is the stoichiometry of the subunits of the 
protein run on gel A? 


(C) What is the stoichiometry of the subunits of the 
protein run on gel B? 


(D) What is the stoichiometry of the subunits of the 
protein run on gel C? 


Assume in making all of these assignments that proteins 
A, B, and C are homooligomers. 


(E) In making these assignments, you have ignored 
the minor bands of lower mobility seen in gels A 
and C. If you are correct in ignoring these bands, 
what should have happened to these bands if the 
proteins had been more dilute in concentration 
when they were treated with the same concentra- 
tions of dimethyl suberimidate? Why? 
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Chapter 9 
Symmetry 


The arrangement in space of the subunits in a multimeric 
protein is its quaternary structure. Many multimeric 
proteins are composed of multiple copies of only one 
particular subunit. These subunits, each necessarily 
identical to the others when free in solution, combine 
together to form the final molecule of protein. In such 
homomultimeric proteins, each of the subunits in the 
crystallographic molecular model can be formally desig- 
nated a protomer’ of the final overall structure. A pro- 
tomer of a multimeric protein is the smallest portion of 
that protein from copies of which its entire quaternary 
structure is created. Consequently, all of the protomers 
in the protein must be the same. 

Some multimeric proteins, like the isoenzymatic 
hybrids of fructose-bisphosphate aldolase (Figure 8-18), 
are composed of two or more distinct subunits that are, 
nevertheless, each the offspring of the same common 
ancestor. Although different in amino acid sequence and 
in the atomic details of their tertiary structure, these sub- 
units are still related closely enough to participate 
together to form the complete heteromultimeric protein, 
much as identical protomers would participate in a 
homomultimeric protein. In such instances, each of the 
individual subunits, although actually different, can be 
considered at low resolution to be one of the indistin- 
guishable protomers forming the overall structure. In 
contrast to the hybrids formed by the isoenzymatic sub- 
units of fructose-bisphosphate aldolase, other proteins 
built from such similar but distinct subunits usually 
incorporate those subunits in unvarying ratios. For 
example, hemoglobin is always an (a),tetramer and 
acetylcholine receptor is always an &,ßyö pentamer, even 
though the o and ß subunits of hemoglobin or the o, p, y 
and 6 subunits of acetylcholine receptor are respectively 
homologous to each other and necessarily superposable. 
These exclusive stoichiometries are established by the 
distinct atomic interactions between the subunits that 
take place as the multimer assembles. 

Some multimeric proteins contain two or more dis- 
similar, unrelated subunits. For example, aspartate car- 
bamoyltransferase contains œ subunits and p subunits 
that are unrelated to each other. Even though the sub- 
units composing such a protein are completely different, 
the final structure produced, when observed as a crystal- 
lographic molecular model, can often be divided for- 
mally into identical protomers, each containing one copy 
of each of the different subunits. A particular number of 


these heterooligomeric protomers arranged in an array 
unique to that protein makes up its quaternary structure. 
Because every protomer is identical to all the others, the 
arrangement of the protomers of such a heteromulti- 
meric protein is formally equivalent to the arrangement 
of the subunits of a homomultimeric protein. 

Each of the multimeric proteins discussed so far 
can be divided formally into a set of identical protomers 
or almost identical subunits. Aside from a few peculiar 
exceptions, the rules that govern the way these pro- 
tomers or these subunits are arranged in space to pro- 
duce the complete molecular structure of the entire 
protein seem to be the same. In an oligomeric protein 
formed from a fixed number of identical protomers, 
those protomers are arrayed around rotational axes of 
symmetry. In an oligomeric protein formed from a fixed 
number of homologous but nonidentical subunits, those 
subunits are arrayed around rotational axes of pseu- 
dosymmetry. In a polymeric protein with an indefinite 
number of identical protomers or homologous subunits, 
those protomers or those subunits are arrayed around a 
screw axis of symmetry or a screw axis of pseudosymme- 
try, respectively. 

There is also a set of multimeric proteins composed 
of nonidentical subunits that are assembled haphazardly 
with no regard, or sometimes a slight regard, for symme- 
try. In these proteins, the various subunits are associated 
with each other by interfaces that, other than their lack of 
symmetry, are almost indistinguishable in their details 
from those holding together symmetric proteins. Such 
asymmetric, heteromultimeric proteins are much more 
likely to assemble and disassemble under different situ- 
ations, and such alternations in quaternary structure are 
often involved intimately in their function. There are also 
members of this set of heteromultimers, however, that 
have stable quaternary structures. Unlike symmetric 
homomultimeric proteins and symmetric heteromulti- 
meric proteins, asymmetric heteromultimeric proteins 
seem to be cobbled together in the absence of any set of 
rules. 


Rotational and Screw Axes of Symmetry 


The fundamental symmetry operations that are avail- 
able to asymmetric objects such as the protomers of a 
protein when they assemble into a homomultimeric 
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structure are those around rotational axes and screw 
axes. If a protein is constructed with rotational symme- 
try, only a finite number of protomers produce the final 
structure. If a protein is assembled with screw symmetry, 
a potentially infinite number of protomers usually can 
combine to produce a polymer of indefinite length. In a 
few isolated instances observed so far, a protein, 
although it is assembled from subunits arranged with 
screw symmetry, is nevertheless forced to have only a 
finite number of protomers in the final structure. Such 
finite structures assembled with screw symmetry are 
thought to be very rare. 

Consider two proteins that illustrate the observa- 
tion that multimeric proteins constructed from identical 
protomers are assembled around rotational or screw 
axes of symmetry. The proteins are malate dehydroge- 
nase from Aquaspirillum arcticum (Figure 9-1A), a 
dimer built from two identical protomers that are each 
folded polypeptides 329 amino acids in length, and actin 
(Figure 9-1B,C),** a protein that forms helical polymeric 
fibers of indefinite length with each fiber built from many 
identical protomers, each of which is the same folded 
polypeptide 375 amino acids in length.* 

A rotational axis of symmetry is a line about which 
a structure can be rotated by 360°/n, where n is an inte- 
ger larger than 1, to superpose upon itself. An exact 2-fold 
rotational axis of symmetry runs through the center of 
the œ dimer in the crystallographic molecular model of 
malate dehydrogenase. If the two subunits of malate 
dehydrogenase, still held together by the same interface, 
had been distorted intramolecularly by the structure of 
the protein itself into significantly different conforma- 
tions or if they had had different sequences but were nev- 
ertheless homologous to each other, a rotational axis of 
pseudosymmetry would superpose one of them upon 
the other. A rotational axis of pseudosymmetry is a line 
about which the structure of a protein can be rotated by 
360°/n to superpose upon each other subunits with 
identical sequence but significantly different conforma- 
tion, subunits of different sequence but the same 
common fold, or distinct but homologous domains. 

The value of n is the fold of the symmetry. The 
2-fold rotational axis of symmetry in the crystallographic 
molecular model of malate dehydrogenase is a line per- 
pendicular to the plane of the page in Figure 9-1A pass- 
ing through the center of the molecule of protein. If the 
image of the upper protomer in the figure is rotated 180° 
about this axis, it superposes exactly on the lower pro- 
tomer. Because of this rotational axis of symmetry, the 
two protomers in the protein are indistinguishable. 

Through the center of the indefinitely long helical 
polymer of actin, of which only a segment is drawn in 
Figure 9-1B, runs a screw axis of symmetry. A screw axis 
of symmetry is a line, passing through a structure, about 
which the structure can be rotated by an angle between 
-180° and 180° and along which the structure can be 
simultaneously translated to superpose upon itself. In 


the molecular model of the actin filament, the screw axis 
of symmetry is a vertical line parallel to the plane of the 
page. If the image of any protomer is transposed by being 
rotated -166° (left-handed) around this axis while it is 
lifted 2.8 nm in a direction parallel to it, it superposes on 
the next protomer in the helical polymer. Because this 
operation can be repeated indefinitely, all of the pro- 
tomers are indistinguishable from each other except by 
their place in line. 

A rotational axis of symmetry (Figure 9-1A) is nec- 
essarily 2-fold, 3-fold, 4-fold, and so forth. This require- 
ment arises from the fact that as one of the images of a 
protomer is being rotated to superpose it on its neighbor, 
all of the images of its neighbors are also simultaneously 
being rotated (Figure 9-1A). When the rotation is com- 
pleted, each of the images must superpose on its respec- 
tive partner. This can be accomplished only if the 
protomers are arrayed about the axis at angles to each 
other that are integral quotients of 360° (360°/n). The 
integer defines the number of times the rotation can be 
accomplished before returning to the beginning. The 
rotational axis of symmetry within malate dehydroge- 
nase is a 2-fold rotational axis of symmetry. After two 
superpositions, the original locations are regained. For 
the rotation to superpose the images of all protomers 
simultaneously each time, no translation along the axis 
can occur. 

Screw axes of symmetry are defined by a rotation 
through any angle and a translation of any distance, and 
they are designated as left-handed or right-handed by 
the same rules as apply to æ helices. If the translation is 
not zero, a screw axis of symmetry produces a helical 
array. By the principle that the majority rules, right- 
handed screws are given a positive sign and left-handed 
screws are given a negative sign. For example, one of the 
screw axes of symmetry in the actin polymer, the left- 
handed one, has a rotational angle of -166°. Trivially, it 
is also possible to generate an actin polymer with a right- 
handed screw axis of +194° but, less trivially, by two 
coaxial right-handed screw axes of +28°. A helical array 
always comes with such a set of different coaxial screw 
axes. A rotational axis of symmetry is simply a special 
case of a screw axis of symmetry where the translation is 
zero and the rotational angle is required to assume 
values that are integral quotients of 360°. 

A screw axis of symmetry, other than the special 
case of a rotational axis of symmetry, in which transla- 
tion cannot occur, does not require the angular steps to 
be integral quotients of 360°. As the image of one of the 
protomers is rotating and rising, the point of superposi- 
tion need not occur at any particular angular disposition 
along the helix produced by the screw axis. There is, how- 
ever, one requirement that limits the angles and transla- 
tions permitted a screw axis of symmetry. Its operation is 
most readily observed by comparing the crystallographic 
molecular models of the protomer of protocatechuate 
3,4-dioxygenase from Pseudomonas putida (Figure 


Figure 9-1: a@-Carbon diagrams drawn from crystallographic molecular models of malate dehydrogenase from A. arcticum’ and actin from Oryctolagus cuniculus.® (A) Malate dehy- 
drogenase is constructed from two subunits, one drawn with thicker line segments than the other. The view is down a crystallographic 2-fold rotational axis of symmetry in the 
space group P2,2)2 of the array. This drawing was produced with MolScript.*° (B) A model of the actin filament was constructed by placing individual actin monomers, a crystal- 
lographic molecular model for which is available,‘ in positions and orientations indicated by the map of electron density calculated from an X-ray fiber diffraction pattern of a gel 
of oriented actin filaments.’ The orientation and position of the actin monomer that were assigned in this way are in agreement with the orientation and position of the protein 
encoded by the mreB gene of Escherichia coli, a homologue of actin, within its filament, which serendipitously happens to be present in a crystal of this protein.® There are eight 
actin monomers in the segment of the filament displayed in this figure. The circles within each monomer designate the location of a bound Ca” ion. This drawing was produced 
with MolScript.*®° The atomic coordinates on which the drawing is based were provided by Ken Holmes. (C) A low-resolution molecular model of the actin filament, calculated by 
image reconstruction of electron micrographs of ordered actin filaments,’ is included, at the same scale, to illustrate more clearly the packing of the monomers along the helix. 
Reprinted with permission from ref 5. Copyright 1983 Academic Press. 
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9-2A)”® and actin (Figure 9-1B). The protomer of proto- 
catechuate 3,4-dioxygenase is composed of two polypep- 
tides of different sequence, but they are clearly 
homologous to each other because they share acommon 
fold. Through the center of the aß heterodimer of proto- 
catechuate 3,4-dioxygenase (Figure 9-2A) there runs a 
screw axis of pseudosymmetry. It is a horizontal line par- 
allel to the plane of the page. If the image of the upper, 
p subunit is transposed by being rotated +169.2° around 
this axis as it is simultaneously shifted to the right 
0.674 nm in a direction parallel to the axis, it superposes 
on the lower, o subunit.” 

A helical polymer of actin can be constructed by 
placing one protomer upon an origin, properly oriented 
with respect to the axis of symmetry; transposing its 
image around the axis by -166° and along the axis by 
2.8 nm; placing another protomer at this next location; 
and repeating this process indefinitely. Suppose this 
were attempted with the common subunit for protocate- 
chuate 3,4-dioxygenase. Place an asubunit upon an 
origin, properly oriented with respect to the axis. Move 
the image of this subunit +169.2° around the axis and 
0.674 nm along the axis. Place a ßsubunit at this next 
location. Move the image of this Bsubunit +169.2° 
around the axis and 0.674 nm along the axis and try to 
place an o subunit at this third location. This third sub- 
unit would have to overlap the first. The problem here is 
that the translation of the screw axis in protocatechuate 
3,4-dioxygenase is insufficient to move the image of the 
subunit far enough along to clear the protomer next to it 
as the helical path completes the turn (Figure 9-2B).” 
Therefore, two subunits can be combined to form a het- 
erodimer, but three or more cannot be combined to form 
a polymer. The parameters of the helix generating actin 
do not cause such collisions. Actin forms a helical poly- 
mer of indefinite length, while protocatechuate 
3,4-dioxygenase is limited to a dimer. In an additional 
example that reinforces the concept, two identical sub- 
units of hexokinase are arranged in crystals of the space 
group P2)2)2, along a screw axis of pseudosymmetry the 
translation of which, 1.38 nm, is so short and the rotation 
of which, 156°, is so large that the protein is also limited 
to a dimer.” 

In theory, the same requirement restricting proto- 
catechuate 3,4-dioxygenase and hexokinase to be dimers 
also restricts malate dehydrogenase to being a dimer. 
Because no translation occurs along the axis in malate 
dehydrogenase, the third protomer would completely 
intersect the first if the game just played with protocate- 
chuate 3,4-dioxygenase were repeated with malate dehy- 
drogenase. It is the inescapable and obvious rules of this 
game that cause protocatechuate 3,4-dioxygenase and 
malate dehydrogenase to be closed structures’ and actin 
to be an open structure. A closed structure is a structure 
that contains a finite number of protomers distributed by 
rotational axes of symmetry or a screw axis of symmetry 
and to which the addition of further protomers by the 
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one or more of these symmetry operations is precluded. 
An open structure is a structure built upon a screw axis 
of symmetry to which protomers can be infinitely added 
by that symmetry operation. An oligomeric protein is a 
multimeric protein with a closed structure of a fixed 
number of protomers, and a polymeric protein is a mul- 
timeric protein with an open structure of an indefinite 
number of protomers and of an indefinite length. 

Each multimeric protein composed of identical 
protomers, even a helical polymer of indefinite length, 
can be considered to be the manifestation of a set of 
interfaces between those protomers. Each of these inter- 
faces includes all of the points of contact that lie between 
two protomers in the structure, and each is formed by the 
association of two complementary faces, one from each 
of the two protomers. These faces are particular regions 
on the respective surfaces of the two associating pro- 
tomers. Because the structures of the protomers are 
identical to each other, each necessarily possesses on its 
own surface all of the unique faces forming the interfaces 
found in the complete molecule. The interface between 
the two subunits of malate dehydrogenase (Figure 9-1A) 
is formed from two identical faces, one on the surface of 
each of its subunits. Note the particularly intimate con- 
tact across the interface as the secondary structures, 
which mimic each other across the axis of symmetry, 
interdigitate. 

Following its biosynthesis, a polypeptide folds to 
form a structure capable of recognizing and being recog- 
nized by other folded polypeptides. If it is to combine 
with its twins to form a multimeric protein constructed 
from identical subunits, it must do so in a series of indi- 
vidual steps, and each step must involve the formation of 
an interface from two complementary faces. The atomic 
contacts within each of these consecutively formed inter- 
faces are as specific as the atomic contacts throughout 
the protein. For the same reasons that a folded polypep- 
tide assumes a precise and unique atomic structure, the 
interface between two subunits has a precise and unique 
atomic structure. 

If, as the result of evolution, a face appears any- 
where on the surface of a folded polypeptide and a face 
complementary to the first appears anywhere else on the 
surface of the same folded polypeptide, the face on one 
copy of that folded polypeptide will associate with its 
complement on another copy of the same folded 
polypeptide. Any such random association between any 
two identical asymmetric objects always defines a 
unique screw axis, an angle of rotation about that screw 
axis, and a translation along that screw axis that will 
superpose the image of one of the asymmetric objects 
upon the other. Either these three parameters are con- 
sistent with an open structure or they are consistent with 
a closed structure. If they are consistent with an open 
structure, the very fact that the one interface can form 
means that many others will subsequently form. A series 
of such interfaces is a helical polymer. 
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The interfaces that are repeated throughout a mul- 
timer composed of identical protomers are the origin of 
the geometry of the final structure. Because they are cre- 
ated by evolution, their appearance is determined by a 
completely random process. As time passes, variation in 
the identity of the amino acids on the surface of a given 
monomeric protein occurs. At some point, in some 
organism, in some species, a constellation of amino acids 
appears on the surface that permits a stable interface to 
form between two of these identical monomers. 

Natural selection operates at this point. It takes 
little imagination to realize that if the vast majority of 
multimeric proteins were not closed structures, the cell 
would rapidly fill with helical polymers and become a 
solid, inflexible object incapable of the pliability essential 
to life. The difficulties encountered with helical polymers 
of hemoglobinS in sickled erythrocytes dramatically 
illustrate this problem.” An example of a polymerization 
that is undesirable for a different reason is that of a tRNA- 
intron endonuclease from Archaeoglobus fulgidus, which 
can form a helical polymer that is enzymatically inactive 
because the active site is sterically blocked by neighbor- 
ing monomers in the polymer." If a monomeric protein 
were to sustain a series of mutations dictating that it 
combine with its twins in such a way that a polymeric 
fiber necessarily results, this set of mutations would 
probably be eliminated by natural selection. Mutations 
leading to closed structures, however, aside from lower- 
ing the osmotic pressure of the cytoplasm, may be neu- 
tral initially, but oligomeric proteins have potentials 
denied to monomeric proteins, and the appearance of an 
oligomeric protein during evolution is the first step in the 
eventual exploitation of these potentials. Nevertheless, if 
the interface is compatible with a closed structure, it can 
initially be fixed by genetic drift as a neutral variation. 
With its fixation within that species, the protein has 
become an oligomer of identical protomers. 

There remains one perplexing fact. As in the case of 
malate dehydrogenase (Figure 9-1A), the vast majority of 
homomultimeric proteins that have been examined con- 
clusively are built around rotational axes of symmetry. 
Often, as in the case of malate dehydrogenase, these 
rotational axes of symmetry can be proven to be exact; 
they always seem to be exact. In fact, protocatechuate 
3,4-dioxygenase (Figure 9-2) is one of the few exceptions 
to this rule, and it is not even a homodimer. If rotational 
axes of symmetry are no more than severely restricted 
cases of screw axes of symmetry, and if a screw axis of 
symmetry is compatible with a closed structure, why 
are multimeric proteins almost always rotationally 
symmetric? 

The main difference between a dimer like malate 
dehydrogenase and a dimer like protocatechuate 
3,4-dioxygenase lies in the respective interfaces defining 
these structures. In a rotationally symmetric dimer such 
as malate dehydrogenase, individual interactions 
between the two protomers come in sets of identical 
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pairs. The best way to see this is to consider the œ helices 
containing Isoleucines 59 on either side of the rotational 
axis of symmetry (Figure 9-1A). These « helices and their 
carboxy-terminal loops of random meander insert into 
pockets in the opposite protomers. Consider a specific 
position such as Isoleucine 59 in the segment of the 
sequence of the lower protomer forming this insertion. 
The amino acid at this location is making several con- 
tacts with amino acids in the pocket in the upper pro- 
tomer. Suppose a mutation occurred at position 59 that 
strengthened these interactions by a certain increment. 
Because the upper protomer was read from the same 
gene, at sequence position 59 in its whelix the same 
favorable change would occur automatically so the 
increment for the increase in stability for the whole inter- 
face would be twice that of each individual increment. 
The same argument could be made for each location in 
the interface. 

In a protein built on a screw axis of symmetry, how- 
ever, the amino acids at the same sequence positions in 
the two protomers never interact with the same amino 
acids from the other protomer across the screw axis of 
symmetry (Figure 9-2B). A mutation, occurring any- 
where in the interface, that adds an increment of stabil- 
ity to the dimer is not duplicated automatically. It follows 
that as variation proceeds during evolution, the incre- 
mental changes that occur within an interface built 
around a 2-fold rotational axis of symmetry are amplified 
2-fold relative to those that occur within each interface 
around a screw axis. This conclusion is valid whether one 
of these interfaces has already appeared during evolu- 
tion or is merely incipient. 

The formation of an interface between two identical 
monomeric proteins, which is the evolutionary event that 
precedes the appearance of a multimeric protein, is not an 
all or none phenomenon. The chemical reaction in ques- 
tion is 


2a == a, (9-1) 


Associated with this reaction is a change in standard free 
energy, and it is this change in standard free energy that 
determines the extent of the reaction. The numerical 
value of this change in standard free energy is dictated by 
the particular interactions that occur among the amino 
acids within the interface. The particular interactions 
that occur are the product of evolution by natural selec- 
tion. Each explicit variation in one of these interactions 
adds or subtracts an increment of free energy to the over- 
all change. If the increments are automatically doubled, 
overall decreases in the standard free energy change for 
the reaction proceed more rapidly over evolutionary 
time. 

Incremental decreases in the standard free energy 
change, however, are also doubled. Although rotationally 
symmetric dimers should appear more frequently, they 
should also disappear more frequently, unless they rep- 


resent advantageous variations. Improvements in a cer- 
tain protein are retained by natural selection, and their 
retention is unaffected by the frequency with which ret- 
rograde changes arise. Mutations turning the dimer back 
into a monomer, such as those that can be performed 
experimentally,” would be eliminated by natural 
selection if they were disadvantageous. It is possible that 
oligomerization of a protein has an immediate advanta- 
geous effect. If it did, the fact that most oligomeric pro- 
teins are built around 2-fold rotational axes of symmetry 
would be a reflection of the fact that such oligomers arise 
with a high frequency and of the fact that they are fixed 
by natural selection because oligomers are advanta- 
geous. 

Because events were discussed in the opposite 
order of the normal progression, a summary of the his- 
torical and logical sequence seems appropriate. As a 
result of genetic variation among the individuals in a 
given species, a constellation of amino acids appears on 
the surface of a monomeric protein within one of those 
individuals. The constellation causes molecules of that 
previously monomeric protein to associate with each 
other. This association necessarily creates an interface. 
This interface necessarily forces the two protomers it 
brings together to be related to each other by a unique 
screw axis of symmetry. The association between the two 
or more protomers created by this screw axis of symme- 
try is tested by natural selection. Occasionally, a helical 
polymer, which necessarily results from a screw axis that 
is not closed, is advantageous and is retained. Most of the 
time, however, the survivors of natural selection are 
closed oligomeric proteins the interfaces of which dictate 
rotational axes of symmetry. 


Suggested Reading 


Monod, J., Wyman, J., & Changeux, J.P. (1965) On the nature of 
allosteric transitions: a plausible model, J. Mol. Biol. 12, 88-118. 


Problem 9-1: Using as your three examples malate dehy- 
drogenase, protocathecuate 3,4-dioxygenase, and fila- 
mentous actin, discuss the topics of rotational axes of 
symmetry, screw axes, open structures, closed struc- 
tures, interfaces, and helical polymers. 


Space Groups 


Screw and rotational operations around axes of symme- 
try occur within crystals of proteins in addition to the 
translational operations relating the unit cells. These 
axes of symmetry are the fundamental operations that 
define the space groups. A space group of identical enan- 
tiomeric objects is a potentially infinite array of those 
objects, the positions and orientations of which are 
related to each other by screw axes of symmetry, rota- 
tional axes of symmetry, and translational operations. 


For reasons mainly associated with the phenomenon of 
diffraction, a unit cell is defined solely in terms of trans- 
lation. A unit cell is the smallest unit from exact copies of 
which, distributed only by simple translational move- 
ments, the entire crystal is created. In contrast, the crys- 
tallographic asymmetric unit is the smallest unit from 
exact copies of which, distributed both by translation 
and by rotation around axes of symmetry, the entire crys- 
tal is created. Crystallographic asymmetric units are usu- 
ally delineated to include one or more intact subunits of 
a protein or one or more intact molecules of a protein. 
If crystallographic asymmetric units containing one 
or more subunits or one or more molecules of a protein 
were always distributed in a crystal so that each of those 
asymmetric units had exactly the same rotational orien- 
tation, all unit cells would be of the same type, Pl, and 
each unit cell would contain only one crystallographic 
asymmetric unit. Packing the same asymmetric units in 
different rotational orientations to form a crystal, how- 
ever, is not forbidden, and strangely shaped enan- 
tiomeric objects, such as asymmetric units containing 
protein, usually are packed with greater efficiency when 
they can assume different rotational orientations. If 
these rotational orientations are to be compatible with 
the infinite regular array that is a crystal, they must be 
related by particular symmetry operations. Dismissing 
mirror symmetry, which is irrelevant to enantiomeric 
objects such as proteins, one is left with axial symmetry. 
In a diagram of the simple space group P2 (Figure 
9-3), the array of unit cells portrayed represents one of 
the layers in a three-dimensional crystal. The array of the 
enantiomeric objects is created by distributing them 
about rotational axes of symmetry and by translational 
operations. Within each unit cell, the two identical enan- 
tiomeric objects are related by a central 2-fold rotational 


Figure 9-3: Packing of identical asymmetric objects, which repre- 
sent the crystallographic asymmetric units from which the array is 
formed, in the space group P2, in which they alternately assume 
two different rotational orientations. The symbol @ indicates a 
2-fold rotational axis of symmetry perpendicular to the page. 
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axis of symmetry perpendicular to the page. Because of 
the two rotational dispositions, which arise only because 
of the increased efficiency of packing, the unit cell ends 
up containing two of the enantiomeric objects rather 
than one. Each of the 2-fold rotational axes in the center 
of each of these unit cells is itself a rotational axis of sym- 
metry for the entire array if it is assumed that the array is 
infinitely propagated in three dimensions. There are also 
three distinct sets of 2-fold rotational axes of symmetry 
between the unit cells. It is a feature of axes of symmetry 
in space groups that more than one distinct set appears 
at a time, and together these sets of axes of symmetry 
define the space group. 

In crystals, as opposed to individual oligomers and 
polymers, screw axes of symmetry are required to have 
rotational angles that are integral quotients of 360°. This 
arises from the fact that these screw axes of symmetry 
operate on the entire array in the crystal. As the image of 
one of the crystallographic asymmetric units from which 
the crystal is composed is rotating and rising around the 
screw axis, the images of every other asymmetric unit in 
the array are rotating and rising around the same axis. At 
the completion of the operation, all of the images in the 
array must superpose on identical partners. This can 
occur only if the rotations of both the screw axes and the 
rotational axes of symmetry in a space group are 2-fold, 
3-fold, 4-fold, or 6-fold. No other rotational multiplicities 
are compatible with an infinite array of asymmetric 
objects. Space groups never have 5-fold rotational or 
screw axes of symmetry because an infinite repetitive 
array of pentagons cannot be formed. Any translational 
distance, as long as it is compatible with an unclosed 
screw axis of symmetry, is compatible with an infinite 
array. Technically no crystal is infinite, but it always has 
the potential to be so, and this potential is all that 
matters. 

The arrangement of asymmetric units in a space 
group can be displayed by distributing drawings of an 
enantiomeric object representing a crystallographic 
asymmetric unit of protein around appropriately posi- 
tioned axes of symmetry. It is convenient to use an enan- 
tiomeric object familiar to all chemists, as well as one 
that is easy to draw, namely, a small enantiomeric mole- 
cule. Lactic acid is one of the smallest enantiomeric mol- 
ecules. Drawings of distributions of lactic acid in the 
space groups C2, P2)2)2,, and P3,21 (Figures 9-4, 9-5, 
and 9-6, respectively)'*"® illustrate certain properties of 
space groups, their rotational axes of symmetry, and the 
unit cells they create. 

Although there are four distinct sets of 2-fold rota- 
tional axes in the space group P2 of Figure 9-3, it is con- 
sidered to be a simple 2-fold array, designated by the 
single integer 2, because all of its axes are parallel to each 
other and no set can exist without all of the others. The 
presence of a rotational axis of symmetry in a space 
group is indicated by an unadorned integer: 2, 3, 4, or 6. 
The presence of a screw axis of symmetry is indicated by 
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Figure 9-4: Space group C2. (A) Molecules of lactic acid arranged in the space group C2. 
The lattice of space group C2 is monoclinic, with two of its crystallographic angles equal 
to 90° and one of its crystallographic angles not equal to 90°. The space group is charac- 
terized by rows of parallel axes of symmetry. There are four rotational axes and four screw 
axes for every unit cell. Each molecule of lactic acid represents the crystallographic asym- 
metric unit. The crystallographic angle that is not equal to 90° is in the top and bottom 
faces of the unit cell, and the axes of symmetry are all vertical. (B) Diagram of the dispo- 
sition of the axes of symmetry in panel A. The diagram is of the top side of the top face of 
the unit cell, and the symbols are for the 2-fold rotational axes of symmetry or the 2-fold 
screw axes of symmetry viewed end-on. The planes containing the alternating screw and 
rotational axes of symmetry are parallel to the side faces of the unit cell and pass through 
the front face at intervals of one-quarter and three-quarters of the unit cell. The screw 
axes are in the front face, the back face, and halfway between the front and the back face. 
The rotational axes of symmetry are at one-quarter and three-quarters of the distance 
between the front and back face. (C) Bovine pancreatic deoxyribonuclease I packed in its 
crystal in the space group C2." The crystallographic asymmetric unit contains one mol- 
ecule of the protein. All of the axes of symmetry run horizontally, rather than vertically, 
and the unit cell has been shifted to its traditional” position so that the planes contain- 
ing the alternating axes of symmetry, which are slanted and normal to the page in panel 
B, are now parallel to the plane of the page and lie in the front face, the back face, and in 
the center, halfway between the front and the back face of the unit cell. The half arrows 
indicate two of the 2-fold screw axes, and the full arrow indicates one of the 2-fold rota- 
tional axes. The three marked axes are in the plane passing through the center of the unit 
cell. The molecules shown are related only by this set of axes of symmetry but are related 
to their neighbors in the unit cells in front and behind by the other two sets of axes of 
symmetry in the front and back faces, respectively. The crystallographic angle in the side 
faces is 91.4°, and the other two crystallographic angles are exactly 90°. Reprinted with 
permission from ref 14. Copyright 1986 Academic Press. 


an integer, 2, 3, 4, or 6, followed by another integer in 
subscript, for example, the 3, screw axes of symmetry in 
the space group P3,21 (Figure 9-6). The main integer, n, 
is the integer by which 360° is divided to obtain the rota- 
tional angle of the steps. The integer in subscript, m, 
determines the fraction, m/n, of the unit cell over which 
the translation occurs with each rotational step. The 
translation is always right-handed to the rotation. 
Consequently, by this convention a 3,screw and a 
4, screw are right-handed, and a 3, screw and a 43 screw 
are left-handed. 

The designation of the space group in a crystalline 
array takes the form of a capital letter* followed by one or 
more numbers. An example would be P2,2)2,, which 
would mean a primitive lattice made of rectangular 
parallelepipeds intersected by three orthogonal sets of 
2-fold screw axes of symmetry (Figure 9-5). The arrange- 
ments of the axes in a particular space group can be 
learned only by consulting a diagram of that space 
group.” 

The crystallographic asymmetric unit has a volume 
that is always an integral quotient of the unit cell and 
thus its volume is equal to or smaller than that of the unit 
cell. An integral number of asymmetric units, but not 
necessarily of intact asymmetric units, composes a unit 
cell. For example, one whole, two halves, four fourths, 
and eight eighths gathered from 15 different asymmetric 
units, each represented by one intact molecule of lactic 
acid, can together create a unit cell containing a total of 
four asymmetric units in the space group P2)2)2, (Figure 
9-5). The crystallographic asymmetric unit in Figure 
9-4C is one molecule of deoxyribonuclease I, and the 
unit cell contains a total of four asymmetric units but not 
four intact molecules of the protein. 

The space group imposes certain constraints on 
the structure of the unit cell. For example, in the space 
group P2 that produces the array of Figure 9-3, the 2-fold 
rotational axes of symmetry must be normal of the plane 
of Figure 9-3 or the superposition cannot occur. 
Therefore, each unpictured asymmetric unit above and 
below the plane of the page in the lattice must be per- 
pendicularly aligned with one of the asymmetric units in 
the plane of the page. This requires that the two angles of 
the fundamental unit cell aligning the axis normal to the 
page be precisely 90°. A lattice where two of the angles 
must be 90° is monoclinic (caption to Figure 4-2). In the 
space group P2)2)2, displayed in Figure 9-5, the three 
necessarily orthogonal sets of screw axes force the unit 
cell to be a rectangular parallelepiped and the lattice to 
be orthorhombic. Each space group other than the most 


* The capital letter refers to the particular relationship between the 
underlying lattice and the unit cell for the space group of interest. 
These relationships are primitive (P), C-face centered (C), A-face 
centered (A), B-face centered (B), all-face centered (F), body cen- 
tered (R), or hexagonally centered (H). 
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primitive, Pl, which lacks axial symmetry entirely, 
enforces one or more of the angles of the unit cell to be 
90° or 120° or enforces one or more of the axes of the 
unit cell to be the same length or enforces requirements 
on both angles and lengths. These are not coincidental 
identities but required identities. They are dictated by 
the symmetry operations and are thus exact quantities. If 
the angle in the monoclinic crystal of Figure 9-3 were not 
exactly 90°, the crystal would be filled with fractures and 
would not be a crystal. 

Reality seems to take place in Cartesian space, and 
there are only 71 space groups into which an infinite 
array of identical crystallographic asymmetric units, 
each containing one or more subunits of protein, can be 
arranged in Cartesian space to produce a crystal. Every 
crystal of protein has its crystallographic asymmetric 
units arrayed in one of these space groups. The space 
group is established as the crystal nucleates and grows in 
the dish; it cannot be dictated by the investigator. She 
can only try to change the conditions of crystallization in 
the hope that another space group will be generated by 
the process. This is often attempted because the identity 
of the space group determines how difficult it will be to 
calculate a map of electron density. 

The 71 space groups available to a crystallizing pro- 
tein are distinguished one from the other by the arrange- 
ment in space of their respective screw axes and 
rotational axes of symmetry. In turn, the space group of 
the particular crystal of protein that is formed is identi- 
fied by the investigator from a characteristic pattern cre- 
ated in the data set by its particular arrangement of axes 
of symmetry. These are patterns in which identities 
occur in the amplitudes of the reflections. For example, 
in the oscillation photograph in Figure 4-1B, the fact that 
the patterns of the intensities of the reflections above 
and below the equator are mirror images of each other is 
consistent with the existence of a rotational axis of sym- 
metry in the crystal parallel to the axis of oscillation. The 
particular pattern of identities within the entire data set 
identifies the axes of symmetry in the crystal and their 
arrangement in space, and hence the space group of the 
crystal. 

The packing of deoxyribonuclease I in the space 
group C2 (Figure 9-4C)," that of the lectin from Pisum 
sativum in the space group P2,212; (Figure 9-5E),'*"’ that 
of telokin from Meleagris gallopavo in the space group 
P3,21 (Figure 9-6C),'® that of porin from Rhodobacter 
capsulatus in the space group R3 (Figure 9-7),'” and that 
of ferredoxin from Aphanothece sacrum in the space 
group Di, (Figure 9-8)” illustrate the accommodations 
of the molecules of proteins to the axes of symmetry 
defining these five space groups. 

There are no rotational axes of symmetry, only 
screw axes of symmetry, relating the asymmetric units in 
the space groups P2)2)2, and P4,, but in the space groups 
C2, P3121, and R3, pairs of asymmetric units are disposed 
around 2-fold rotational axes or triplets of asymmetric 
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Figure 9-5: Space group P2,2)2). (A) Molecules of lactic acid arranged in the space group P2,2)2). The lattice of space group P2,2)2, is 
orthorhombic, with all three crystallographic angles equal to 90°, and the unit cell is rectangular. This results from the fact that there are 
three sets of 2-fold screw axes of symmetry, each set is necessarily orthogonal to the others, and in each set the screw axes are parallel to one 
of the crystallographic axes. There are four screw axes for each unit cell in each of the three sets. The molecule of lactic acid represents a crys- 
tallographic asymmetric unit, and from any one molecule of lactic acid the entire array can be created by performing the operations of 2-fold 
screw symmetry in the three orthogonal directions. (B) The four parallel horizontal screw axes of symmetry passing through the left face of 
the unit cell in panel A from front left to back right. These axes are in the top face, bottom face, and halfway between the top face and bottom 
face, one-quarter and three-quarters of the way across the unit cell. (C) Vertical 2-fold screw axes of symmetry passing through the top face 
of the unit cell in panel A. Half of these vertical axes coincide with each vertical column of lactic acids, of which there are two for each unit 
cell. The other half of these vertical screw axes of symmetry coincide with vertical lines in the center of each vertical face. When rotated 180° 
about any one of these vertical axes while simultaneously rising half of a unit cell, the lattice superposes on itself. (D) The four parallel hori- 
zontal screw axes of symmetry passing through the front face of the unit cell in panel A from front right to back left. These axes are at one- 
quarter and three-quarters of the distance between top and bottom face and one-quarter and three-quarters of the distance between the side 
faces. The symbols in panels B-D are the 2-fold screw axes of symmetry seen end-on. (E) The lectin from P. sativum packed in its crystal in 
the space group P2)2,2,.'°!” The asymmetric unit contains one complete molecule of the protein, which is a homodimer of identical subunits 
related by a local noncrystallographic 2-fold rotational axis of symmetry. The unit cell is positioned in the traditional” location with respect 
to the three orthogonal sets of 2-fold screw axes of symmetry so that the top is shifted one quarter of its width forward and the front is shifted 
one quarter of its width downward relative to their positions in panels C and D, respectively. The central, complete molecule of protein is 
represented in thicker lines. It is related to the two molecules drawn with thinner lines above and below it by one of the vertical screw axes 
of symmetry halfway across the unit cell and one quarter of the way forward from the back face. It is related to the two molecules to its upper 
right and upper left, respectively, by one of the horizontal screw axes of symmetry parallel to the plane of the page, one quarter of the way 
down the unit cell, and halfway between the front and back faces. And it is related to the two molecules to its lower right and lower left, respec- 
tively, by the two screw axes of symmetry normal to the plane of the page half of the way up from the bottom face and each one quarter of 


the way in from one of the sides. Reprinted with permission from ref 17. Copyright 1990 Elsevier B.V. 


units are disposed around 3-fold rotational axes of sym- 
metry that are inherent in the space groups. In the space 
groups C2 and P3,21, each and every asymmetric unit is 
related to at least one of its adjacent twins by a particular 
2-fold rotational axis of symmetry. This unique relation- 
ship establishes a particular pair of twins. This pair is 
exceptional because the whole lattice can be divided into 
an array of these pairs, and in each of these pairs the 
orientation of the two twins to each other is the same. In 
the space group C2 pictured in Figure 9-4A, such a pair 
of twins includes any two lactic acid molecules that have 
their carboxylic acid functional groups opposite the 
hydroxyls of their neighbors. Every lactic acid molecule 
in the crystal participates in one and only one such 
symmetric relationship. 

The particular 2-fold rotational axes of symmetry 
connecting the noted pairs of rotationally symmetric 
twins are crystallographic axes. A crystallographic axis 
of symmetry is any one of the axes of symmetry that 
defines the space group of the crystal. It exists only when 
the protein is in a crystal. The 2-fold rotational axis of 
symmetry running through the center of malate dehy- 
drogenase (Figure 9-1) and connecting its twin subunits 
is a molecular axis of symmetry. A molecular axis of sym- 
metry is an axis of symmetry that exists in the molecule 
of a protein regardless of whether or not that molecule is 
in solution or in a crystal. Crystallographic axes and 
molecular axes arise under different circumstances. The 
molecular axes of symmetry are created as the oligomeric 
protein assembles in the cell, and the crystallographic 
axes of symmetry are created as the crystal grows in a 
dish. These two types of axes of symmetry are independ- 
ent properties. They can, however, but they are never 
required to, coincide. If a molecular rotational axis of 
symmetry coincides with a crystallographic rotational 
axis of symmetry, it can be stated that the molecular axis 


when it is within the crystal is an exact rotational axis of 
symmetry because a crystallographic axis of symmetry is 
necessarily an exact rotational axis of symmetry. Ifa crys- 
tallographic axis of symmetry were not exact, crystal 
growth could not continue because the small equal devi- 
ations between the actual rotational operation and an 
exact rotational operation would add up across the crys- 
tal and eventually produce an interruption in the lattice. 
An exact rotational axis of symmetry in a protein is a 
rotational axis which, in a crystallographic molecular 
model of that protein, coincides with a crystallographic 
axis of symmetry. 

The molecular 2-fold rotational axis of symmetry in 
the middle of each dimer of malate dehydrogenase from 
A. arcticum (Figure 9-1) coincides with one of the crys- 
tallographic 2-fold rotational axes of symmetry in the 
space group P2,2,2 in which it crystallizes.” Conse- 
quently, the molecular axis is exact. The molecular 3-fold 
rotational axis of symmetry in the middle of each trimer 
of porin from R. capsulatus coincides with one of the 
crystallographic 3-fold rotational axes of symmetry in the 
space group R3 in which it crystallizes (Figure 9-7)'? and 
is exact. 

There is one crystal that contains an educational 
exception to this rule that a molecular axis of symmetry 
coinciding with a crystallographic axis of symmetry is 
exact. Each off protomer of the (a); trimeric portion of 
rat mitochondrial H*-transporting two-sector ATPase” is 
found in its own asymmetric unit when the protein crys- 
tallizes in the space group R32. Each (aß); trimer, how- 
ever, has one and only one ysubunit associated with it. 
Consequently, only one of the asymmetric units in each 
triplet of asymmetric units containing the entire 
(ap); trimer can contain a particular segment of a ysub- 
unit. In fact, the ysubunits are distributed at random 
among the three so the map of electron density contains 
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Figure 9-6: Space group P3,21. (A) Molecules of lactic acid arranged in the space group P3,21. The 
lattice of space group P3,21 is trigonal, with two crystallographic angles of 90° and one crystallo- 
graphic angle of 60°. The angle of 60° is in the bottom faces of the three unit cells that are drawn. The 
space group P3,21 has a set of right-handed 3-fold screw axes of symmetry. In the drawing, two of 
the 3-fold screw axes of symmetry of each unit cell coincide with the vertical columns of lactic acid 
molecules. The third 3-fold screw axis of symmetry of each unit cell coincides with the vertical edge 
of each unit cell. Because the other crystallographic angle in the bottom face is 120°, counterclock- 
wise rotation of 120° about this latter axis while simultaneously rising one-third of a unit cell causes 
the three upward columns of lactic acid around this axis to superpose on themselves and the three 
downward columns of lactic acid around this axis to superpose on themselves. Normal to each of the 
3-fold screw axes of symmetry running along each vertical edge of the unit cells and intersecting per- 
pendicularly with these latter vertical axes are 2-fold rotational axes of symmetry arrayed at 60° 
angles to each other. In the bottom face of the unit cell, the first 2-fold rotational axis of symmetry is 
the diagonal bisecting the 120° angle. Each successive 2-fold rotational axis of symmetry is 60° coun- 
terclockwise to the one below it and one-sixth of the distance up the axis of the unit cell. 
(B) Diagram” of the arrangement of all of the axes of symmetry in space group P3421. The view is 
looking down onto the bottom face of the unit cell containing the 60° angle. The flared triangles 
denote 3-fold screw axes normal to the page. The full arrows are 2-fold rotational axes of symmetry, 
and the half arrows are 2-fold screw axes of symmetry parallel to the plane of the page. The fractions 
indicate how far up the vertical edge or vertical face of the unit cell the axes parallel to the plane of 
the page are found. Reprinted with permission from ref 15. Copyright 1983 D. Reidel. (C) Telokin 
from the gizzard of M. gallopavo packed in its crystal”? in the closely related space group P3,21. The 
protein is a monomer of a single folded polypeptide, and the asymmetric unit contains only one of 
these monomers. The view is from the top so the 3-fold screw axes of symmetry in this view are 
normal to the plane of the page rather than vertical. They are in the same locations in the unit cell as 
those in the space group P3,21 (panel B) but are left-handed rather than right-handed screw axes 
(hence the 3, instead of 3,). This difference causes the 2-fold screw axes of symmetry and 2-fold rota- 
tional axes of symmetry parallel to the plane of the page (panel B) to be encountered in a clockwise 
succession rather than a counterclockwise succession but at the same locations, angles, and depths. 
Reprinted with permission from ref 18. Copyright 1992 Elsevier B.V. 
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three overlapping, symmetrically displayed copies of the 
ysubunit each copy having one third the expected elec- 
tron density. The actual individual asymmetric units in 
each molecule of the protein in the crystal are not sym- 
metric within themselves, but the crystal is 3-fold sym- 
metric. The individual molecular rotational axes of 
pseudosymmetry relating the three of protomers is pre- 
cisely 3-fold, but the symmetry of each molecule is per- 
turbed by the presence of the necessarily asymmetric 
ysubunit. 

The results from several crystallographic experi- 
ments serve to illustrate the distinction between crystal 
symmetry and molecular symmetry and the conse- 
quences of their coincidence. 

Triose-phosphate isomerase, a dimer composed of 
two identical subunits, crystallizes in the space group 
P2]2]2,. The crystallographic asymmetric unit is the 
œ dimer, and the 2-fold molecular rotational axis of sym- 
metry within the dimer cannot coincide with a crystallo- 
graphic rotational axis of symmetry because there is 
none.” Glyceraldehyde-3-phosphate dehydrogenase 
(phosphorylating), an (œ) tetramer composed of four 
identical subunits, also crystallizes in the space group 
P2,2)2,, and the asymmetric unit is necessarily the entire 
tetramer. Glutathione peroxidase, however, also an 
(œ), tetramer composed of four identical subunits, crys- 
tallizes in the space group C2, and the asymmetric unit is 
the oœ dimer.” Consequently, in this instance one of the 
molecular axes of symmetry coincides with a crystallo- 
graphic axis of symmetry, and the two œ dimers com- 
posing the (œ), tetramer must be related to each other by 
an exact 2-fold molecular rotational axis of symmetry. 
The other two molecular 2-fold rotational axes of sym- 
metry orthogonal to the one that coincides cannot also 
coincide with orthogonal crystallographic 2-fold rota- 
tional axes of symmetry because there is none. 

Phosphorylase b, a dimer composed of two identi- 
cal subunits, crystallizes in the space group P432}2,, the 
asymmetric unit is one subunit,” and the dimer must be 
constructed upon an exact 2-fold molecular rotational 
axis of symmetry. Alcohol dehydrogenase, a dimer com- 
posed of two identical subunits, crystallizes in the space 
group C222), and the molecular 2-fold rotational axis of 
symmetry coincides with a crystallographic 2-fold rota- 
tional axis of symmetry and must be exact.” The 
a, trimer of chloramphenicol O-acetyltransferase from 
E coli crystallizes in the space group R32,”° and the 
Go trimer of bovine purine-nucleoside phosphorylase 
crystallizes in the space group P243,” and in both crystals 
molecular and crystallographic 3-fold rotational axes of 
symmetry coincide, and the molecular axes must be 
exact. L-Lactate dehydrogenase, a tetramer composed of 
four identical subunits, crystallizes in the space group 
F422, and one single subunit is the asymmetric unit.” 
Consequently, the tetramer is constructed around three 
exact orthogonal 2-fold molecular rotational axes of sym- 
metry that coincide with three precisely orthogonal crys- 
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tallographic 2-fold rotational axes of symmetry. 
(S)-2-Hydroxy-acid oxidase crystallizes in the space group 
1422. As a result, its molecular 4-fold rotational axes of 
symmetry and its four molecular 2-fold axes of symmetry 
coincide with crystallographic rotational axes of symme- 
try and must be exact. Dihydrolipoyllysine-residue 
acetyltransferase from Azotobacter vinelandii? and 
dihydrolipoyllysine-residue succinyltransferase from 
E. coli," related oligomers each composed of 24 identical 
subunits, both crystallize in the space group F432 in 
which the asymmetric unit is a single subunit, and the 
three molecular 4-fold rotational axes of symmetry, the 


acked in the space group R3 of its 
The lattice of space group R3 is trig- 


onal with two crystallographic angles of 90° 


Figure 9-7: Molecules of porin from R. cap- 
and one of 60°. Three-fold rotational axes of 
symmetry and 3-fold screw axes of symme- 
try normal to the front face pass along its 
edges and through its interior. The protein is 
a trimer of three identical folded polypep- 
tides, and the asymmetric unit contains only 
one single folded polypeptide of this trimer. 
Each molecule of the protein has one of the 
crystallographic rotational axes of symmetry 
running through its center so that each of its 
subunits is exactly the same as the others 
and related to them by an exact 3-fold rota- 
tional axis of symmetry. The 3-fold screw 
axes of symmetry running between the 
trimers all superpose trimers on other 
trimers. Reprinted with permission from ref 
19. Copyright 1992 Elsevier B.V. 
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four molecular 3-fold rotational axes of symmetry, and 
the four molecular 2-fold rotational axes of symmetry all 
coincide with crystallographic axes of symmetry and 
must be exact. Because both the 2-fold molecular axis of 
symmetry of the decamer of DNA with the sequence 
ATGACGTCAT and the 2-fold molecular axis of symme- 
try running through the coiled coil of œ helices in the 
œ dimer of the crystallographic molecular model of a 


portion of the general control protein GCN4 coincide 
with the same crystallographic axis of symmetry, the 
molecular axis of the DNA must be exact and the molec- 
ular axis of the coiled coil must be exact and precisely 
perpendicular to the central helical axis of the DNA.” 

Each of the last eight oligomeric proteins has all of 
its subunits arranged with a perfect rotational symmetry 
that can be conclusively proven. The coincidences of 
their molecular axes of symmetry and the crystallo- 
graphic axes of symmetry that permitted these proofs, 
however, were by chance, and it seems reasonable to 
assume that in solution, the first three proteins, that just 
happened to crystallize in space groups incompatible 
with one or more of their molecular rotational axes of 
symmetry, are no less symmetric than the last eight. 
Consequently, while it is sometimes possible to deduce 
the whole symmetry of the protein from the crystallo- 
graphic symmetry, the absence of the appropriate coin- 
cidences permitting this deduction does not mean that 
the molecule of protein lacks the missing symmetry. 

Often it is assumed that molecular axes of symme- 
try within the asymmetric unit ofa crystal are exact so that 
the electron density of the protomer can be enhanced by 
averaging around those molecular axes. Cystathionine 
y-synthase from Nicotiana tabacum, an o,tetramer of 
folded polypeptides 445 aa in length, crystallizes in the 
space group P2,212; with two (œ), tetramers in each of the 
four asymmetric units. Data could only be gathered to 
Bragg spacing of 0.29 nm, but when the electron densi- 
ties of the eight protomers in the asymmetric unit were 
superposed about the respective molecular axes of sym- 
metry and averaged, the map of electron density that 
resulted was a remarkable improvement over any one of 
the unaveraged maps.” 

Once averaging around noncrystallographic rota- 
tional axes of symmetry has been used to obtain a map 
of electron density accurate enough to build a convinc- 
ing molecular model of the protomer, copies of the indi- 
vidual protomers are placed in their locations in the 
asymmetric unit, and the refinement of the molecular 
model is performed without rotational averaging. Only 
by refining the individual structures without rotational 
averaging do the differences among the protomers in the 
asymmetric unit, often dramatic ones,” become appar- 
ent. Often these differences, even though significant, can 
be dismissed as being due to distortions of an otherwise 
symmetric structure resulting from the asymmetric 
demands of packing it into the crystal. 

When molecular axes of symmetry fail to coincide 
with crystallographic axes of symmetry, other criteria 
are used to evaluate the precision of the molecular rota- 
tional symmetry. For example, in the case of the 
(œ) tetramer of glyceraldehyde-3-phosphate dehydro- 
genase (phosphorylating) in the space group P2,2)2), 
the spacing of the heavy metal atoms in the isomor- 
phous replacement” and the final map of electron den- 
sity were both consistent with the tetramer being 


constructed around three orthogonal apparently exact 
2-fold molecular rotational axes of symmetry. The 
molecular 2-fold rotational axis of symmetry of the 
dimeric lectin from P. sativum does not coincide with a 
crystallographic rotational axis of symmetry because 
there is none in its space group P2,2)2, (Figure 9-5), but 
when the one folded polypeptide in the crystallographic 
molecular model is rotated around the molecular axis of 
symmetry, its œ carbons superpose on those of its twin 
with a root mean square deviation of 0.06 nm.” The 
rotational angle around the molecular axis of symmetry 
producing the smallest root mean square deviation 
(0.019 nm) of the superposed a@carbons of the two sub- 
units in the crystallographic molecular model of for- 
mate dehydrogenase from Pseudomonas was 179.9°.*’ 
In similar superpositions, the œ carbons of the two sub- 
units of triose-phosphate isomerase from yeast coin- 
cided with a root mean square deviation of less than 
0.04 nm,” and those of the two subunits of transketo- 
lase from yeast, by 0.024 nm.” The error in the coordi- 
nates for the crystallographic molecular model of 
inorganic diphosphatase from yeast was estimated to be 
0.037 nm; upon superposition of the one of its subunits 
on the other by rotation around the molecular axis of 
symmetry, the root mean square deviation of all of the 
atoms in the two polyamide backbones was only 
0.038 nm.“ Although there were functional indications 
that the subunits of ribulose-bisphosphate carboxylase 
were in different environments in solution, when the 
different folded polypeptides in the crystallographic 
molecular model were superposed by rotation around 
the molecular axes of symmetry, their œ carbons coin- 
cided with a root mean square deviation of less than 
0.02 nm, well within the accuracy of the coordinates 
themselves, and it was concluded that there was no 
structural evidence for asymmetry.’ 

When such superpositions are performed about 
molecular rotational axes of symmetry that do not coin- 
cide with crystallographic rotational axes of symmetry, it 
is often found that flexible regions of the crystallographic 
molecular model do not coincide as well as more rigid 
regions because they respond readily to variations in 
their surroundings resulting from differences in crystal 
packing.” For example, most of the a carbons of the five 
identical Bsubunits of heat-labile enterotoxin from 
E. colisuperpose to within 0.04 nm upon successive rota- 
tions about the molecular 5-fold rotational axis of sym- 
metry, but the positions of the a carbons in the flexible 
loop between Glycine 54 and Serine 60 deviate by 
0.1-0.2 nm from each other.” Unlike malate dehydroge- 
nase from A. arcticum, cytoplasmic malate dehydroge- 
nase from Sus scrofa crystallizes in the space group 
P2,2,2, with its œ dimer in each asymmetric unit.“ The 
rotational angle around the molecular axis of symmetry 
producing the superposition with the minimum root 
mean square deviation is 174° instead of 180°. It was, 
however, concluded that this observation was mislead- 
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ing because it had been shown previously that crystal- 
lization of the protein causes several of its enzymatic 
properties, which are uncomplicated in solution, to 
become asymmetric. Consequently, it was concluded 
that crystal packing forces caused an otherwise symmet- 
ric protein to become remarkably asymmetric. 

There are, however, a few observations suggesting 
that there may be subtle asymmetries in some 
homooligomers. For example, when the subunits of the 
crystallographic molecular model of the (œ), tetramer of 
L-2-hydroxyisocaproate dehydrogenase were super- 
posed intramolecularly, it was found that two super- 
posed well on each other (root mean square deviation of 
0.02 nm) and the other two did also (root mean square 
deviation of 0.03 nm) but that neither member of one of 
these pairs superposed well on either member of the 
other pair (root mean square deviation of 0.13 nm).® 
This asymmetry seemed too large to be due to crystal 
packing, and it is possible that this protein may be asym- 
metric in solution. 

It is also possible to perform a self-rotation func- 
tion on the asymmetric unit in the space group of a 
crystal to detect molecular symmetry before phases are 
available. The 11-fold molecular rotational axis of sym- 
metry in the trp RNA-binding attenuation protein from 
Bacillus subtilis, which cannot coincide with any of the 
permissible crystallographic rotational axes of symme- 
try, was readily detected within the asymmetric unit of 
its space group C2 by performing a self-rotation func- 
tion. 


Suggested Reading 


Hahn, T., Ed. (1983) International Tables for Crystallography, Vol. A: 
Space-Group Symmetry, D. Reidel, Dordrecht, The Netherlands. 


Problem 9-2: In the space group P3,21 portrayed in 
Figure 9-6, every lactic acid molecule is related to one of 
its neighbors by the same 2-fold rotational axis of sym- 
metry. Draw two lactic acid molecules arranged around 
that specific rotational axis of symmetry. 


Problem 9-3: The stereodiagram on the next page is 
based on the crystallographic molecular model of the 
four individual molecules of the protein HPr from the 
monosaccharide transport system of Streptococcus fae- 
calis arranged in the tetragonal unit cell.“ Reprinted 
with permission from ref 47. Copyright 1994 Elsevier 
B.V. 


On three rectangles representing the front, the side, and 
the bottom, respectively, of the unit cell in the orienta- 
tion of the figure, indicate the positions of any screw axes 
of symmetry or rotational axes of symmetry passing 
through the unit cell. Use symbols for rotational or screw 
axes of symmetry like those in the diagrams in Figures 
9-3 to 9-8." 


Problem 9-3 


Oligomeric Proteins 


A protomer is the smallest portion of a protein from 
copies of which its entire quaternary structure is created. 
The protomers of a homooligomeric protein are 
arranged around rotational axes of symmetry, and the 
number of protomers and the way in which they are 
arranged designates the point group to which the 
oligomer belongs. A point group is the distribution in 
which a particular number of protomers are arranged 
about one or more particular rotational axes of symme- 
try oriented at particular angles to one another and inter- 
secting at a common origin, in which all of the centers of 
mass of the protomers are equidistant from this origin, 
and in which all of the symmetrically related positions 
are occupied. The finite and specific number of the pro- 
tomers, their equidistance from the origin, and the 
common intersection of all of the rotational axes of sym- 
metry distinguishes the point groups from the space 
groups as well as from the linear groups, which designate 
open linear multimers such as helical polymers. 
Oligomeric proteins have exploited all of the available 
point groups lacking mirror planes. 

A cyclic point group arranges protomers in a circle 
about only one rotational axis of symmetry, and the dif- 
ferent cyclic point groups are distinguished by the fold of 
the axis. In the simplest of these cyclic point groups, 


point group 2(C,),* two protomers are arranged around 
a 2-fold rotational axis of symmetry to form a dimer. Half 
of all homooligomeric proteins are dimers (Table 9-1) 
the protomers of which, with few exceptions (Figure 9-2), 
are arranged with the symmetry of point group 2(C)). 
Malate dehydrogenase (Figure 9-1A) and x bungarotoxin 
from Bungarus multicinctus (Figure 9-9)” are examples 
of oligomers of point group 2(C,). 

« Bungarotoxin crystallizes in the space group P6 
with an œ dimer in the crystallographic asymmetric unit. 
In the crystal, one protomer superposes on the other 
upon a 178.5° rotation about the molecular axis of sym- 
metry. With the exception of the flexible loops between 
Cysteine 27 and Proline 36 and between Proline 15 and 
Glutamine 18, which adjust malleably to the constraints 
of crystal packing, the œ carbons superpose to a root 
mean square deviation of 0.05 nm around the molecular 
rotational axis of symmetry, which probably differs from 
180° also because of crystal packing. The interdigitations 
of the side chains forming the interface mimic each other 
across the axis of symmetry, for example, the hydropho- 
bic cluster containing Isoleucine 20, Cystine 46/58, and 
Valine 60 from one protomer and Phenylalanine 49 from 
the other. The axis of symmetry itself runs through the 
hydrogen bond between the two Glutamines 48. The 
structure is unquestionably closed; every amino acid in 
the common sequence of the two identical polypeptides 
that is enclosed within the interface from one of the pro- 
tomers is also enclosed from the other. 

In larger proteins with more than one domain, it is 
often the case that the interface forming the dimer con- 
nects only one of the domains to its twin in the other pro- 


Table 9-1: Frequency of Homooligomers 


number of subunits percent observed” 


50 
5 
35 
10 
3 
1 
2 


DOWN 


1 
1 


“The table of oligomeric stoichiometries published by Darnell and Klotz”? was 
used to calculate these frequencies. Because this is a selected and incomplete list, 
some numbers were rounded to the nearest 5%. 


* There are two notations currently in use to identify the individual 
point groups. Crystallographers use the Hermann—Maugin nota- 
tion, 2, 3, 4, ..., 222, 322, 422, ..., 23, 432, and 532, which will be the 
notation used here out of parentheses. Spectroscopists use the 
Schönflies notation, C2, C3, Cy, ..., Da, D3, Dy, ..., T, O, and I, which 
will be the notation used here within parentheses. Chemists other 
than crystallographers and spectroscopists will use one or the other 
of these notations depending on who taught them point groups or 
what book they happened to open. 


tomer, but this limited interface nevertheless dictates the 
symmetry of point group 2(C,) for the whole dimer.*”” 
The domain (117 aa of the 450 aa in the intact protein) 
forming the entire interface holding together the dimer 
of glutathione-disulfide reductase” has been detached 
genetically and shown to form by itself a dimer.” In the 
a, dimer of human hexokinase, each of the two subunits 
contains two internally duplicated domains, each super- 
posable on a complete subunit of hexokinase from yeast, 
and they are connected by an a-helical segment of six 
turns. The amino-terminal domain of one subunit forms 
an interface with the carboxy-terminal domain of the 
other subunit and vice versa to form a dimer with two 
well-separated but identical interfaces on either side of 
the molecular 2-fold rotational axis of symmetry.” 

Proteins that associate with double-helical DNA 
are often dimeric, and such a dimer uses its own 2-fold 
rotational axis of symmetry to recognize a local 2-fold 
rotational axis of pseudosymmetry in the double helix of 
the DNA. Regardless of its sequence, between the two 
bases in any one of the pairs of bases in a molecule of 
DNA and running perpendicular to the hydrogen bonds 
between them is a local 2-fold axis of pseudosymmetry 
(Figure 3-9). A local rotational axis is an axis of rotation 
around which superpose upon one another only struc- 
tural units immediately adjacent to that axis. Because a 
real molecule of DNA is rarely straight, if the whole mol- 
ecule of DNA is rotated around one of these local axes by 
180°, it roughly superposes on itself in the immediate 
vicinity of the axis but does not superpose beyond the 
immediate vicinity because of its curvature. Because the 
two bases in the central base pair are never the same, the 
base pairs on either side of the central base pair are usu- 
ally not the same, and the DNA is usually curved, this 
local axis is always pseudosymmetric. Halfway between 
any consecutive two of these 2-fold rotational axes of 
pseudosymmetry and at an angle halfway between their 
two angles, there is also a local 2-fold rotational axis of 
pseudosymmetry. 

The existence of this second set of local 2-fold rota- 
tional axes of pseudosymmetry means that palindromic 
sequences* such as 


5'-TAGACGTCTAGACGTCTA-3' 
3'-ATCTGCAGATCTGCAGAT-5' 


in which the two individual strands have the same 
sequence and the sequence of the duplex inverts at its 
center, have local 2-fold rotational axes of symmetry 
(indicated by @). Rotation around any one of these axes 
by 180° superposes identical bases, within the segment. 


* A palindromic sequence, for example, “able was i ere i saw elba”, 
is a sequence that superposes upon itself when rotated around an 
axis running through its center. 
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The existence of the local 2-fold rotational axes of 
pseudosymmetry running through the base pairs means 
that for the sequence 


5 eh, E a 
3'-TATCATCTICACGAAGATAGTA-5' 


which contains a split palindrome, rotation about the 
local 2-fold axis of pseudosymmetry through the central 


y IR pr 
SEHEN, 


RR 


Figure 9-9: Two-fold rotational symmetry 
of point group 2(C;). The crystallographic 
molecular model of the œ dimer of x bun- 
garotoxin from B. multicinctus"? is formed 
from two identical folded polypeptides 66 aa 
in length arranged around a 2-fold rotational 
axis of symmetry running through the center 
of the molecule normal to the page. This 
drawing was produced with Molgcrin 
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G-C pair superposes the palindromic sequences in the 
boxes. 

Many of the sequences of DNA that particular pro- 
teins are required to recognize are palindromic. A pro- 
tein built from two copies of a folded polypeptide in 


which the two copies are associated with each other 
around a molecular 2-fold rotational axis of symmetry 
can present, one on each copy, two identical faces com- 
plementary to faces on the surface of the palindromic 
segments of DNA. One of these surfaces can bind to one 
half of the palindrome and the other to the other half of 
the palindrome simultaneously if those surfaces are 
positioned relative to their own 2-fold rotational axis of 
symmetry as the two palindromes are to theirs. Evolution 
by natural selection has discovered that, by this simple 
strategy, the standard free energy of association between 
the DNA and the protein is automatically doubled and 
the specificity of the interaction is more than squared. 

The metrepressor of E. coli is a dimeric protein 
built from two copies of a folded polypeptide 104 aa in 
length that are associated with each other around a 
molecular 2-fold rotational axis of symmetry such that 
each copy presents a face complementary to a face on 
one half of the palindrome 9-1. When the repressor is 
bound to the DNA, the molecular 2-fold rotational axis of 
symmetry of the protein coincides with the local palin- 
dromic 2-fold rotational axis of symmetry of the DNA 
(Figure 9-10).” 

The arc repressor of E. coli is a similarly symmetric 
protein that binds to the split palindrome in 9-2.°° When 
the palindrome is split, the protein can recognize both its 
symmetry and the length of the separation between its 
two halves in the segment of DNA. In the crystallographic 
molecular model” of the dimeric E2 DNA-binding 
domain of bovine papillomovirus-1, each of the identical 
subunits binds to one half of a palindrome, the two 
halves of which are split by four base pairs rather than 
five. In this complex, the molecular 2-fold rotational axis 
of symmetry of the protein and the molecular 2-fold rota- 
tional axis of symmetry of the DNA are coaxial and both 
coincide with a crystallographic rotational axis of sym- 
metry. 

In the cyclic point group 3(G;), the three protomers 
of a homooligomeric protein are arranged around a 
3-fold rotational axis of symmetry. Chloramphenicol 
O-acetyltransferase from E. coli is a trimer with symme- 
try of point group 3(C,) (Figure 9-11).°® It crystallizes in 
the space group R32 with one of its three identical folded 
polypeptides as the crystallographic asymmetric unit. 
Consequently, the molecular 3-fold rotational axis of 
symmetry is exact. The individual subunits are compact 
globular structures, and the three interfaces connecting 
them together are formed by the association of two rela- 
tively flat faces. The interfaces that produce the trimer 
are all identical to each other, and rotations of 120° 
about the axis of symmetry superpose them upon each 
other. Each subunit has on its surface one copy of each of 
the two distinct but complementary faces that form the 
interface. Three identical p strands, one from each sub- 
unit, direct their side chains into the center of the trimer 
at the axis of symmetry. 

The frequency with which trimers with symmetry of 


point group 3(G;) arise during evolution by natural selec- 
tion is about 10 times lower than the frequency with 
which dimers with symmetry of point group 2(C;) arise 
(Table 9-1). In fact, at one time it was thought that such 
trimeric proteins did not exist, but there are now crystal- 
lographic molecular models for a number of them.” % 
It is by considering the problem of assembling a trimer 
relative to that of assembling a dimer that the reason for 
the scarcity of trimers becomes apparent. 

The interface is the feature that evolves and not the 
oligomer. A dimer built around a 2-fold rotational axis of 
symmetry is held together by one more or less continuous 
interface centered on the rotational axis of symmetry. 
The axis divides the complete interface into two identical 
halves (Figures 9-1A and 9-9). Each half is the formal 
equivalent to one of the three identical interfaces distrib- 
uted around the 3-fold rotational axis of symmetry in a 
trimer (Figure 9-11). The incremental decreases of free 
energy associated with favorable mutations are not auto- 
matically doubled during the evolution of an interface in 
a trimer as they are in the evolution of an interface in a 
dimer. Because termolecular collisions rarely occur, the 
assembly of an oligomeric protein must proceed through 
a series of bimolecular steps. A bimolecular collision pro- 
ducing a dimer automatically involves the simultaneous 
formation of the two halves of its interface and incorpo- 
rates the free energies of formation of both halves into the 
immediate product. The first step in the assembly of a 
trimer, however, is the collision of two monomers to form 
only one of its three interfaces. This first interface, stand- 
ing alone, must exist long enough or form often enough 
for the third protomer to complete the ring, yet it is the 
evolution of this initial interface that does not benefit from 
symmetry as does the evolution of the initial and final 
interface of the dimer. Therefore, trimers should appear 
less frequently than dimers during evolution. In favor of 
the symmetric trimer, however, is the fact that, as with the 
two halves of the interface in a dimer, its three identical 
interfaces can evolve simultaneously, a fact that magnifies 
the incremental decrease in free energy change in the over- 
all formation of the complete oligomer for each favorable 
mutation. 

If the 3-fold axis of symmetry of chloramphenicol 
O-acetyltransferase were not an exact rotational axis of 
symmetry but a closed screw axis, one of the interfaces 
could not be equivalent to the other two because the ring 
could not be completed. It is most likely that this pecu- 
liar one of the three interfaces would not fit together 
properly because it would be formed from the same two 
faces now required to associate in a different way from 
the way they associated at the other two interfaces. Such 
a structure would be significantly weaker than a rota- 
tionally symmetric trimer because of the one misaligned 
interface. 

o, Tetramers with cyclic symmetry of point group 4 
(C,), such as L-lactate dehydrogenase (cytochrome) from 
Saccharomyces cerevisiae (Figure 9-12), 1-fuculose- 
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phosphate aldolase from E coli,” L-ribulose-phosphate 
4-epimerase from E coli,’ and mammalian” and bacte- 
rial IMP dehydrogenase; a@;pentamers with cyclic 
symmetry of point group 5(C;), such as acetylcholine- 
binding protein from Lymnaea stagnalis,” the B subunit 
of heat-labile enterotoxin from E coli,” and human 
serum amyloid P component; Ge hexamers with cyclic 
symmetry of point group 6(C), such as transitional 
endoplasmic reticulum ATPase from Mus musculus” 
and the replicative DNA helicase encoded by the bacter- 
ial plasmid RSF1010;” and o; heptamers with cyclic sym- 
metry of point group 7(C,), such as transcriptional 


58 The three identical subunits are 


distinguished by the different widths of the 
line segments. This drawing was produced 


ic molecular model of this trimeric 
with MolScript.*® 


Figure 9-11: Three-fold rotational sym- 
metry of point group 3(C3). An a-carbon 
ferase from E coli is drawn from the crystal- 


diagram of chloramphenicol O-acetyltrans- 


lograph 
protein. 
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activator NTRC1 from Aquifex aeolicus and the small 
nuclear ribonucleoprotein from Pyrobaculum 
aerophil um,” are even more rare than trimers with cyclic 
symmetry of point group 3(C;). Amazingly, however, 
there is a protein that is an ou undecamer with symme- 
try of cyclic point group 11(C11).”® 

Tetramers are the second most common type of 
oligomeric protein (Table 9-1). Almost all tetrameric pro- 
teins have their protomers arranged in the symmetry of 
dihedral point group 222(D;) instead of the symmetry of 
cyclic point group 4(C,). A dihedral point group is a point 


group in which the protomers are arranged on two circles 
of equal radius around a central n-fold rotational axis of 
symmetry and n 2-fold rotational axes of symmetry per- 
pendicular to that central axis. In the dihedral point group 
222(D,), the central axis is a 2-fold rotational axis of sym- 
metry, there are two 2-fold rotational axes of symmetry 
orthogonal to it and to each other, and all of the axes inter- 
sect at the same point. If a sphere is placed at each of the 
four vertices of an equilateral tetrahedron, and if each of 
the four spheres has a diameter equal to the length of a 
side of the tetrahedron, the spheres will contact each other 
at six points (Figure 9-13A). If the spheres are asymmetric 
objects, for example, the subunits of a tetramer, then the 
six points of contact, for example, the six possible inter- 
faces between pairs of subunits, are three identical twins, 
designated 1, 2, and 3 in the diagram, each different from 
the other two. Each of the six interfaces contains within 
itself one of the 2-fold rotational axes of symmetry. Each 
interface is superposed on its twin when rotation occurs 
about either of the two axes of symmetry orthogonal to the 
one passing through it, but no rotational axes of symme- 
try can superpose an interface 1 on an interface 2 or an 
interface 3 or superpose an interface 2 on an interface 3. 
Ideally, a tetrameric protein with dihedral symme- 
try of point group 222(D,) should have three different 
pairs of interfaces, but usually one pair does not form or 
is almost nonexistent because the tetrahedron is 
squashed in one dimension. For example, if the tetrahe- 
dral arrangement of spheres in Figure 9-13A were 
squashed from above the plane of the page, the inter- 
faces 3 would pull apart. The tetramer of symmetry of 
point group 222(D,) that is the crystallographic molecu- 
lar model of 2,2-dialkylglycine decarboxylase (pyruvate) 
from Burkholderia cepacia is such a squashed tetrahe- 
dron (Figure 9-13B).” The two identical vertical inter- 
faces, normal to the plane of the page through which 
runs the exact vertical molecular 2-fold rotational axis of 
symmetry in the plane of the page (the interfaces 1 in 
panel A), are the most extensive (75 nm? interface”')* and 
were designated by the crystallographers as the inter- 
faces connecting the monomers that form the two 
dimers of the structure. In these two roughly flat inter- 
faces, there are loops of polypeptide that interpenetrate 
the two subunits. The two identical horizontal interfaces, 


* The size of an interface will be presented as the total accessible 
surface area from its two participants that is buried upon its for- 
mation. Consequently, each interface is formally defined as the 
adhesive interaction between any two subunits that holds only 
those two subunits together in the complex. This definition ignores 
any cooperativity that arises in interfaces in which n subunits are 
held together by a total of n interfaces around an n-fold rotational 
axis of symmetry when n is greater than 2, such as the stability 
gained when the third subunit is added to complete a cyclic trimer 
or the stability realized in the entwined cone of four carboxy ter- 
mini (Figure 9-12) in the four formal interfaces around the 4-fold 
rotational axis of symmetry in L-lactate dehydrogenase 
(cytochrome). 


(A) Point group 


Figure 9-13: A dimer of dimers with dihedral sym- 
metry of point group 222(D,). 

cal and one horizontal in the plane of the page and 
the third normal to the page. The three different 
pairs of interfaces between the spheres, labeled 1, 
2, and 3, are each distinct: interfaces 1 connecting 
the open sides of the letters G, interfaces 2 con- 
B. cepacia drawn from its crystallographic molecu- 
lar model.” Two of the subunits have been drawn 
with thicker line segments than the other two. The 
protein crystallizes in the space group P6,22 with a 
single subunit in the crystallographic asymmetric 
unit, and all three molecular 2-fold rotational axes 
of symmetry coincide with crystallographic rota- 
tional axes of symmetry. This drawing was pro- 


2,2-dialkylglycine decarboxylase (pyruvate) from 
duced with MolScript.*® 


asymmetric by placing the letter G inside of each. 
The letters G are superposed on each other by rota- 
tion around the three axes of symmetry, one verti- 
necting the bottoms of the letters G, and interfaces 
3 connecting the T segments of the letters G. 
(B) a-Carbon diagram of the (œ), tetramer of 


222(D;). Four spheres are placed at the vertices of 
an equilateral tetrahedron. The spheres are made 


which are tilted somewhat from being normal to the 
plane of the page and through which runs the exact hor- 
izontal molecular 2-fold rotational axis in the plane of the 
page (the interfaces 2 in panel A), are less extensive 
(24 nm? interface”) and were designated as the inter- 
faces holding together the two dimers to form the 
tetramer. The interfaces along the exact 2-fold rotational 
axis of symmetry normal to the plane of the page (inter- 
faces 3 in panel A) are almost nonexistent because of the 
squashing of the tetrahedron. 

Almost all tetramers of dihedral symmetry of point 
group 222(D,) are constructed along these lines, with an 
almost nonexistent or a completely nonexistent pair of 
interfaces and with one of the remaining two pairs of 
interfaces being more extensive than the other. 
Consequently, tetramers of dihedral symmetry, which 
include almost all tetrameric proteins, are dimers of 
dimers and the complete structure of such a dimer of 
dimers can be designated as that of an (a). tetramer to 
distinguish it from an o tetramer of cyclic symmetry of 
point group 4(C,). The same points noted in the descrip- 
tion of the evolution of a rotationally symmetric dimer 
from an ancestral monomeric protein are of equal valid- 
ity in describing the evolution of a rotationally symmet- 
ric dimer of dimers from an ancestral dimeric protein. 
Also, for the same reasons, a dimer of dimers is a closed 
structure. 

Because each of the three pairs of interfaces in a 
tetramer with dihedral symmetry is completely different, 
they each have different strengths, reflected in their free 
energies of association. One of the two or three pairs of 
interfaces must be the strongest, and if there is any dis- 
sociation of an (qa),tetramer into its constituent 
œ dimers, it is usually this strongest pair of interfaces 
that will be retained, one in each of the two dimers. For 
example, the uncomplicated, reversible dissociation of 
(œ) homotetrameric, enzymatically active bacterial 6- 
phosphofructokinase into enzymatically inactive 
œ dimers results from the sundering of the less extensive 
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of the two distinct types of interfaces that produce the 
structure of dihedral symmetry.*° 

Hemoglobin is an honorary homotetramer. It is 
built from two different polypeptides, a and D. each pres- 
ent in two copies to provide the four protomers. The two 
polypeptides are homologous in sequence” and their 
tertiary structures are superposable.™ In the heterote- 
tramer, the four protomers of hemoglobin occupy the 
same arrangement as the four protomers of 2,2-dialky- 
glycine decarboxylase (pyruvate), but two of the rotational 
axes are rotational axes of pseudosymmetry. The aa in- 
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terface and the ff interface, through which runs the 
molecular 2-fold axis of symmetry, are the rudimentary 
pair, so the structure can be represented as (œp). 

The oxygenated form of hemoglobin in solution 
participates in the dissociation 


Ka 
(aß), == 208 (9-2) 


and the value of the dissociation constant 


is 1 uM.” That the same interface always remains in the 
dimer follows from the fact that no hybrid dimers of 
the type a’ß or aß’ are formed under conditions where 
the reaction of Equation 9-2 is rapidly interconverting 
tetramers and dimers in a mixture of two hemoglobins, 
(aß), and Lef, from two species, dog and human, 
respectively, even though hybrids of the type (aß) (a’B’) 
did form readily.” In this experiment, hemoglobins from 
the two species were used to permit isoelectrophoretic 
separation of the hybrids (as in Figure 8-18). As a control, 
it was shown that the hybrid dimers a’ and af’ could be 
artificially formed and easily separated isoelectrophoret- 
ically, from each other and from the parent dimers «ß 
and a’ B’. The a’B and a’ hybrids also could not be shuf- 
fled by the reaction of Equation 9-2. Therefore one of the 
two different pairs of interfaces between o and ßsub- 
units in the hemoglobin tetramer®’ must be much 
stronger than the other. 

A number of other oligomeric proteins also partici- 
pate in dissociations the equilibrium constants for which 
are large enough to be measured. In fact, the dimer of 
interleukin-8 that is observed crystallographically has a 
dissociation constant so large that the protein is actually 
a monomer at physiological concentrations.” Most 
oligomeric proteins, however, have interfaces so strong 
that their dissociation does not occur within normal 
ranges of concentration. 

A molecular asymmetric unit is the smallest unit of 
the structure of an oligomeric molecule that, when sub- 
mitted to the appropriate symmetry operations, creates 
the entire structure. The individual subunits of the cyclic 
oligomers in Figures 9-9 to 9-12 are the molecular asym- 
metric units of their respective molecules. In the 
(œ) tetramer of 2,2-dialkylglycine decarboxylase (pyru- 
vate) (Figure 9-13B), the molecular asymmetric unit is, 
by inspection, the single folded polypeptide. If the one 
polypeptide is positioned in space, its image is rotated 
180° around any one of the 2-fold rotational axes of sym- 
metry, and another identical folded polypeptide is 
placed where the image of the first has thus been posi- 
tioned, one of the three possible dimers in the tetramer 
is created. If the image of this dimer is then rotated about 


another of the 2-fold rotational axes of symmetry and 
another identical dimer is placed where the image of the 
first has been positioned, then the entire tetramer is cre- 
ated. 

If the tetrahedron of Figure 9-13A is squashed flat, 
the structure becomes a ring of four protomers that 
vaguely resembles a ring with cyclic symmetry of point 
group 4(C,), often with a large hole in the middle 8? 
Nevertheless, it is easy to distinguish the former from the 
latter because its ring has dihedral symmetry of point 
group 222(D,). The orientation of the protomers in an 
oligomer with dihedral symmetry alternates up-down- 
up-down around the ring, and the two orthogonal 2-fold 
rotational axes of symmetry in the plane of the ring 
between every pair of subunits, which do not exist in a 
structure with cyclic symmetry, remain. 

As with protocatechuate 3,4-dioxygenase (Figure 
9-2), which is a rare example of a dimer the subunits of 
which are arranged around a screw axis of symmetry, 
lac repressor from E coli” and the lectin from Arachis 
hypogaea” are tetramers in each of which a pair of rota- 
tionally symmetric dimers of identical subunits is 
arranged around a screw axis of symmetry. 

Ribulose-phosphate 3-epimerase from chloroplasts 
of Solanum tuberosum (Figure 9-14A)” is an (œ); hexa- 
mer with the symmetry of dihedral point group 322(D;) 
(Figure 9-14B), phosphoribulokinase from Rhodobacter 
sphaeroides (Figure 9-15A)” is an (©), octamer with the 
symmetry of dihedral point group 422(D,) (Figure 
9-15B), and peroxiredoxin from Crithidia fasciculata 
(Figure 9-16)” is an (œ); decamer with the symmetry of 
dihedral point group 522(D;). The crystallographic 
molecular model of the (œ) hexamer of ribulose-phos- 
phate 3-epimerase has a central molecular 3-fold rota- 
tional axis of symmetry and three molecular 2-fold 
rotational axes of symmetry at angles of 60° to each 
other, each of them orthogonal to the central axis and 
one of them exact. The crystallographic molecular model 
of the (a@),octamer of phosphoribulokinase has a cen- 
tral, exact 4-fold rotational axis of symmetry and four 
exact 2-fold rotational axes of symmetry at angles of 45° 
to each other and all of them orthogonal to the central 
axis. The crystallographic molecular model of the 
(0); decamer of peroxiredoxin has five molecular 2-fold 
rotational axes of symmetry at angles of 36° to each 
other, all of them orthogonal to the central molecular 
5-fold rotational axis of symmetry. 

In the dihedral point groups of odd fold (3-fold, 
5-fold, and 7-fold), the interfaces found at the two ends 
of each 2-fold rotational axes of symmetry, although 
rotationally symmetric about the axis, are different from 
each other (Figure 9-14B); in the dihedral point groups of 
even fold (4-fold and 6-fold), the interfaces found at the 
two ends of each 2-fold rotational axis of symmetry are 
the same, but there are two different kinds of 2-fold rota- 
tional axes of symmetry that alternate around the central 
axis (Figure 9-15B). Nevertheless, in both instances, odd 


and even, there are only two types of interfaces associ- 
ated with the 2-fold rotational axes of symmetry regard- 
less of how large the fold (hence the notation n22). A 
third type of n-fold symmetric interface can occur across 
the central n-fold axis. 

In oligomeric proteins with dihedral symmetry 
there are usually a set of n strong, identical interfaces dis- 
tributed around the central axis that connect pairs of sub- 
units into dimers. Examples are the three interfaces 
approximately normal to the plane of the page each con- 
necting an upper subunit with a lower subunit in Figure 
9-14A and the five interfaces at about 2 o’clock, at about 
5 o'clock, at 7 o’clock, at about 10 o’clock and at about 12 
o’clock in Figure 9-16. As is the case with those in isolated 
dimers with cyclic symmetry, each of these interfaces has 
a 2-fold rotational axis of symmetry running through its 
center. In ribulose-phosphate 3-epimerase, phospho- 
ribulokinase, and peroxiredoxin, these interfaces forming 
the dimers are more extensive than the interfaces joining 
the dimers into the hexamer, the octamer, or the decamer, 
respectively, so the proteins are a trimer of dimers, a 
tetramer of dimers, and a pentamer of dimers, respec- 
tively, all with dihedral symmetry. 

There are three configurations in which such 
dimers can be assembled into rings with dihedral sym- 
metry: eclipsed, staggered, and splayed (Figure 9-17). 
The dimers in ribulose-phosphate 3-epimerase (Figure 
9-14A) are eclipsed; those in ribulokinase (Figure 9-15A) 
are staggered; and those in peroxiredoxin (Figure 9-16) 
are splayed. In ribulose-phosphate 3-epimerase the only 
interfaces holding the three dimers together are the six 
identical ones, three between the subunits in the upper 
ring and three between the subunits in the lower ring,” 
and in peroxiredoxin the only interfaces holding five 
dimers together are the five identical ones, each between 


Figure 9-14: A trimer of eclipsed dimers with dihedral 
symmetry of point group 322(D;). (A) a-Carbon diagram of 
the homohexamer of ribulose-phosphate 3-epimerase from 
chloroplasts of S. tuberosum, drawn from the crystallo- 
graphic molecular model (space group P3221) of the 
enzyme.” The view is along one of the dihedral 2-fold rota- 
tional axes of symmetry, which coincides with a crystallo- 
graphic rotational axis of symmetry. The central 3-fold 
noncrystallographic molecular rotational axis of symmetry 
is vertical in the plane of the page. The three eclipsed 
dimers are drawn with lines of different widths and differ- 
ent shading. There are no interactions between an upper 
subunit and any lower subunit other than the one with 
which it forms a dimer. The accessible surface area buried 
in the interface within each dimer is 21 nm? interface, and 
that within an interface connecting the dimers around the 
central 3-fold rotational axis of symmetry is 15 nm? inter- 
face™. (B) Six molecules of lactic acid arranged with dihe- 
dral symmetry of point group 322(D;). The molecules of 
lactic acid at 1, 5, and 9 o’clock are above the plane of the 
page, and those at 3, 7, and 11 o’clock are below the plane 
of the page. The three 2-fold rotational axes of the dihedral 
point group (arrows with full heads) are in the plane of the 
page, and the solid triangle denotes a 3-fold rotational axis 
of symmetry normal to the plane of the page. This drawing 
was produced with MolScript.*” 
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an upper subunit and a lower subunit in neighboring 
dimers.” In phosphoribulokinase, however, the six iden- 
tical interfaces connecting the three upper subunits to 
each other and the three lower subunits to each other are 
as extensive as the five identical interfaces between a 
lower subunit of one dimer and an upper subunit of a 
neighboring dimer, but the interfaces holding the dimers 
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Figure 9-15: A tetramer of staggered dimers with dihedral molec- 
ular symmetry of point group 422(D,). (A) a-Carbon diagram of 
phosphoribulokinase from R. sphaeroides, drawn from the crystal- 
lographic molecular model (space group P432) of the enzyme.” 
The view is along the central 4-fold rotational axis of symmetry. All 
molecular axes of symmetry coincide with crystallographic axes of 
symmetry. The four subunits to the front are drawn with thin lines; 
the four to the back, with thick lines. The accessible surface area 
buried in the interface within each of the four fundamental dimers 
centered on the vertical and horizontal 2-fold rotational axes of 
symmetry is 35 nm? interface”; that within an interface connecting 
a subunit to the front or a subunit to the back within one of those 
dimers to a subunit to the front or a subunit to the back, respec- 
tively, in a neighboring dimer is 6.4 nm? interface; and that within 
an interface connecting a subunit to the front or a subunit to the 
back within one of those dimers to a subunit to the back or a sub- 
unit to the front, respectively, in a neighboring dimer is 9.0 nm? 
interface’. This drawing was produced with MolScript.’® (B) A 
drawing of eight molecules of lactic acid arrayed with dihedral 
symmetry of point group 422(D,). The molecules at 2, 5, 8, and 11 
o'clock are above the plane of the page, and the other four are 
below the plane of the page. The four 2-fold rotational axes of sym- 
metry of the dihedral point group are in the plane of the page, and 
the solid square denotes a 4-fold rotational axis of symmetry 
normal to the plane of the page. The atomic coordinates on which 
this drawing is based were provided by David H. T. Harrison. 
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the most common fundamental u 


dimers,’®” staggered tetramers of dimers, (HIT! and 


trimer, respectively, are formed by two different domains 
splayed pentamers of dimers? 


of dimers.” In all of these proteins with dihedral sym- 
metry, the fact that the most extensive interfaces in the 
structure are those holding the dimers together suggests 


that evolution assembled these oligomers from preexist- 
than either trimeric proteins with cyclic symmetry, 


tetrameric proteins with cyclic symmetry, 


proteins with cyclic symmetry. 
even though the œ dimers from which they are con- 


of dimers,'°*'™ a splayed tetramer of dimers,'°'” an 
eclipsed pentamer of dimers,” and an eclipsed hexamer 
ing symmetric dimers. This conclusion would make 
sense because dimeric proteins are far more common 
oligomeric proteins because there are examples of pro- 
teins that have different dihedral quaternary structures 


upper subunit of one dimer to the lower subunit of a 


across the central plane of the structure to connect the 
neighboring dimer and vice versa. 


in that subunit 
minal segment 


ss 


6, 


structed are obviously related and are themselves held 
together by homologous interfaces. Human nucleoside- 
diphosphate kinase is a hexamer with dihedral symmetry 
of point group 322(D;), but nucleoside-diphosphate 
kinase from Myxococcus xanthus is a tetramer with dihe- 
dral symmetry of point group 222(D,), even though the 
œ dimer from human protein is readily superposable on 
the œ dimer of the protein from M. xanthus.” Within the 


together are three times more extensive than either of 
these types.” In the staggered hexamer of sulfate adeny- 
lyltransferase from Penicillium chrysogenum, which also 
has the symmetry of point group 322(D;), the interfaces 
connecting each subunit in one trimer with one subunit 
in the other trimer and with another subunit in the other 


family of superoxide dismutases, a fundamental œ dimer, 
which is superposable among the proteins from different 
species, either stands alone or is arranged in several dif- 
ferent ways to form distinct (0), tetramers.'°? The cyto- 
plasmic ribulose-phosphate 3-epimerases from animals 
and plants are œ dimers rather than (a); hexamers, as are 
those from chloroplasts (Figure 9-14A).’”"!° In fact, the 
dimeric ribulose-phosphate 3-epimerases from the cyto- 
plasms of fungi and animals are more closely related to 
the dimeric ribulose-phosphate 3-epimerase from a given 
plant than is the hexameric ribulose-phosphate 
3-epimerase from its own chloroplasts. The dihydrodi- 
picolinate synthases from Nicotiana sylvestris and E. coli 
are tetramers with dihedral symmetry assembled from 
superposable œ dimers, but the dimers face each other in 
opposite directions in the two proteins." Consequently, 
it seems that most oligomers with dihedral symmetry 
are (œ) dimers of dimers, (a)3trimers of dimers, 
(Q)4tetramers of dimers, (a);pentamers of dimers, 
and (a), hexamers of dimers. 

There are of course exceptions to the observation 
that most proteins with dihedral symmetry are assem- 
bled from symmetric dimers. Histidine decarboxylase 
from Lactobacillus is a dimer of trimers," 
glutamate-ammonia ligase from Salmonella 
typhimurium is a dimer of hexamers,''” and human 
serum amyloid P component is a dimer of pentamers.'™* 
There are also two proteins, a chaperonin’” and an intra- 
cellular, multifunctional endopeptidase"? that are 
[(@B)7]2 dimers of heptamers. The fact that the octamer of 
IMP dehydrogenase from Tritrichomonas foetus with 
dihedral symmetry of point group 422(D,) dissociates 
into two tetramers with a dissociation constant of 1 uM 
demonstrates that it is a dimer of two tetramers, each of 
cyclic symmetry." 

In proteins with cyclic symmetry such as 
dimers'!*'”° or trimers, tetramers, and pentamers'” as 
well as proteins with dihedral symmetry,” related super- 
posable monomers have been assembled by evolution 
around different interfaces or into different quaternary 
structures. The malleability of the arrangements of pro- 
tomers with both dihedral and cyclic symmetry within 
sets of oligomers of the same family of proteins or even 
the same species of proteins'”° leads to the conclusion 
that the quaternary structure of a protein provides little 
information about its evolutionary relationships. 

If, however, the quaternary structure is retained in 
the same protein from widely different species of organ- 
isms, the evolutionary constraints within the interfaces 
producing that quaternary structure are as stringent as 
those in the interior of the protein. For example, in the 
heterodimer of isoform 1 and isoform 2 of the R2 protein 
of ribonucleoside-diphosphate reductase from S. cere- 
visiae, the interface closely resembles those in the 
homodimers of the same protein from E.coli and 
M. musculus even though the folded polypeptides of iso- 
forms 1 and 2 from S. cerevisiae differ significantly from 
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their folded homologues from E. coliand M. musculus at 
their peripheries.'”” 

When monomers are transformed by evolution into 
ahomooligomeric ring with cyclic symmetry (Figures 9-9, 
9-11, and 9-12) or when homodimers are transformed by 
evolution into a homooligomeric ring with dihedral sym- 
metry (Figures 9-14A, 9-15A, and 9-16), a face and the 
complement to that face must be created on the surface 
of the monomer or the monomer in the dimer, respec- 
tively. The face and its complement must be positioned, 
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Figure 9-16: A pentamer of splayed dimers 
with dihedral symmetry of point group 
522(D;). An a-carbon diagram of peroxire- 
doxin from C. fasciculata” is viewed down 
5-fold rotational axis of symmetry. Because 
the space group of the crystal is P24, none of 
the 2-fold molecular axes of symmetry coin- 
cides with a crystallographic axis of symme- 
try. The subunits are drawn with lines of 
different thickness. Of the total accessible 
surface area of each subunit, 13% is buried in 
one interface and 8.5% in the other interface. 
This was 
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Figure 9-17: Interfaces between dimers in rings of dihedral sym- 
metry. When dimers stand vertically around the ring, they are 
eclipsed when viewed along the central axis, and the interfaces 
connecting those dimers to each other are all identical and sym- 
metrically displayed around dihedral 2-fold rotational axes of sym- 
metry (Figure 9-14A). When dimers tilt sufficiently so that both the 
pair of interfaces symmetrically displayed around each of their 
central 2-fold rotational axes of symmetry and an interface along 
the adjacent 2-fold rotational axis of symmetry between an upper 
subunit and a lower subunit in a neighboring dimer are present 
simultaneously, the dimers are staggered when viewed along the 
central axis (Figure 9-15A). When the dimers tilt so much that only 
the interface between an upper subunit and a lower subunit in a 
neighboring dimer can form along alternate 2-fold rotational axes 
of symmetry, the dimers are splayed when viewed along the cen- 
tral axis (Figure 9-16). The transformation from staggered to 
splayed is formally equivalent to squashing the tetrahedron of 
spheres and losing interfaces 3 in Figure 9-13. 


respectively, on the surface of the monomer or the 
monomer in the dimer so that they are directed relative 
to each other at angles of 60°, 90°, 108°, or 120°, consis- 
tent with the angles relating the orientations of the inter- 
faces in each monomer in a triangular, square, 
pentagonal, or hexagonal ring, respectively. Only in this 
way can every interface lock fully into place in the com- 
plete ring. Because it is only when all of the interfaces are 
fully locked that the full standard free energy of forma- 
tion is realized, most of the existing rings are continuous 
and symmetric.'” In HNRNP arginine methyltransferase 
from S. cerevisiae, however, the face and its complement 
on each dimer that produces the trimer of dimers have 
evolved so they are directed outward at a little more than 
60° to each other. As a result, one of the three interfaces 
cannot quite lock into place, and there is a gap in the 
ring.” 


The homooctamer of the repressor from bacterio- 
phage A, which has dihedral symmetry, dissociates nei- 
ther into four cyclic dimers nor into two cyclic tetramers 
as do most homooctamers of dihedral symmetry. Instead, 
it splits apart along a cleavage plane parallel to the 4-fold 
rotational axis of symmetry and one of the 2-fold rota- 
tional axes of symmetry, equivalent to a horizontal plane 
normal to the plane of the page in Figure 9-15B, to pro- 
duce two tetramers that each retain an exact 2-fold rota- 
tional axis of symmetry, equivalent to the vertical 2-fold 
rotational axis of symmetry in Figure 9-15B."°'?® The 
other two now local 2-fold rotational axes of symmetry, 
however, in each of these tetramers end up at an 80° angle 
to each other instead of the 90° angle they were forced to 
assume in the octamer. If two of these relaxed tetramers 
were to form an octamer, there would necessarily be a gap 
of 20° at one of the 2-fold rotational axes of symmetry. 
Unlike what happens in arginine methyltransferase, when 
the octamer of the repressor forms, the tetramers distort 
to close the gap and produce a structure with complete 
dihedral symmetry of point group 422(D,). Because the 
strain of this distortion is relieved on dissociation, the 
octamer dissociates along an unexpected set ofinterfaces. 

In many proteins, the folded polypeptides of which 
contain internal duplications, the two superposable, 
duplicated domains are related by a 2-fold rotational axis 
of pseudosymmetry. Thiosulfate sulfurtransferase is an 
example of such an arrangement (Figure 9-18). The 
amino acid sequences of its two domains are not signifi- 
cantly related to each other (11% identity with no gaps 
upon structural alignment). Nevertheless, one domain 
superposes upon the other with a root mean square devi- 
ation of 0.2 nm (117 out of 146 a carbons) upon a 179° 
rotation about the axis of pseudosymmetry normal to the 
page in Figure 9-18 and a translation along that axis of 
less than 0.1 nm. As with other domains of this level of 
kinship, the central portions coincide well, but loops 
connecting elements of secondary structure differ, often 
dramatically. The two internally duplicated domains of 
methionyl aminopeptidase from E. coli (Figure 7-7B) 
superpose on each other upon a rotation of 174° anda 
translation of 0.06 nm along a screw axis of pseudosym- 
metry between them!” and those of porcine pepsin 
superpose upon a rotation of 173°.'?? Other examples of 
proteins formed by internal duplication in which the two 
domains are related by an approximate 2-fold rotational 
axis of symmetry are arabinose binding protein,” chy- 
motrypsinogen,'?' and sulfite reductase.'” 

Presumably, these 2-fold rotational axes of pseu- 
dosymmetry are the remains of the 2-fold rotational axes 
of symmetry in the dimers of two identical protomers 
that were the ancestors of each of these proteins before 
the gene duplication occurred. The duplicated polypep- 
tide incorporated the original rotational axis of symme- 
try, but following the duplication the two halves began to 
evolve separately and diverge. Because the sequences 
have diverged, the superposition is not between struc- 


tures with the same amino acid sequence, and the axis 
has become a 2-fold rotational axis of pseudosymmetry. 
There is at least one example of a protein in which a 
single folded polypeptide has an internal 3-fold rota- 
tional axis of pseudosymmetry™*’** and one with an 
internal 5-fold rotational axis of pseudosymmetry,'”° 
presumably the remains of an ancestral trimer and an 
ancestral pentamer, respectively. The first and the third 
of the three domains in the single folded polypeptide 
composing pyruvate oxidase from Lactobacillus plan- 
tarum superpose on each other with a root mean square 
deviation of 0.19nm upon rotation of 190° around a 
rotational axis of pseudosymmetry between them.'*° 
These two halves of the duplicated gene, when undupli- 
cated, encoded the two identical subunits of an œ dimer. 
There is, however, another unrelated domain of about 
the same size inserted into the polypeptide between the 
two that nevertheless remain symmetrically arrayed. 

There are also examples of proteins in which an 
early gene duplication, which produced two domains 
now related by a 2-fold axis of pseudosymmetry bearing 
witness to that duplication, was then followed by a later 
gene duplication. This later duplication produced 
another 2-fold axis of pseudosymmetry relating two 
copies of the product of the early duplication." 
Because the three 2-fold axes of pseudosymmetry in each 
of these proteins, the two early ones and the one later 
one, are almost parallel to each other, the original pro- 
tein must have been an œ dimer and after its two identi- 
cal subunits were fused, the product of the fusion then 
evolved to become an a, dimer, the identical subunits of 
which were then fused. One of these proteins is now 
again an a, dimer of two subunits, each with four inter- 
nally duplicated, symmetrically arrayed domains,” per- 
haps on its way to yet another duplication. 

Not all folded polypeptides with internally repeat- 
ing domains display such rotational axes of pseudosym- 
metry, and there is no direct correspondence between 
internally repeating domains and rotational axes of sym- 
metry. Immunoglobulin G (Figure 7-13) contains two 
different polypeptides, one long and one short, that are 
both composed of internally repeating domains. Within 
neither of the polypeptides are any of the adjacent inter- 
nally repeating domains related by a rotational axis of 
pseudosymmetry. Only proteins that were symmetric 
oligomers before the replication occurred can retain ves- 
tiges of their former rotational axes of symmetry. 

Phaseolin from Phaseolus vulgaris contains an 
a trimer with cyclic symmetry.” The subunit of phase- 
olin is the product of a gene duplication, and its two 
superposable domains are related by a 2-fold rotational 
axis of pseudosymmetry. In the trimer, these three 2-fold 
rotational axes intersect the central 3-fold rotational axis 
of symmetry and are normal to it. Consequently, the 
ancestral protein must have been a hexamer with dihe- 
dral symmetry, the constituent œ dimers of which were 
fused by gene duplication. It is certain that such an event 
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occurred to the ancestor of 5-carboxymethyl-2-hydroxy- 
muconate A-isomerase. It is also a trimer with subunits 
containing internally duplicated domains around local 
2-fold rotational axes of pseudosymmetry orthogonal 
to the central 3-fold rotational axis of symmetry. 
4-Oxalocrotonate tautomerase is an enzymatically 
related protein, the monomer of which is homologous to 
both of the duplicated halves of the monomer of 
5-carboxymethyl-2-hydroxymuconate A-isomerase. 
4-Oxalocrotonate tautomerase is an ahexamer with 
dihedral symmetry, the entire structure of which super- 
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Figure 9-18: a-Carbon diagram drawn from 
acid sequence of the protein does not con- 


tain significant internal homology, but the 


internal duplication is manifest in the crys- 
are related to each other by a 2-fold rota- 


tallographic molecular model of the folded 
polypeptide that forms this monomeric pro- 
tein. The two domains, above and below a 
horizontal plane passing through its center, 
tional axis of pseudosymmetry normal to the 
plane of the page running through the center 
of the protein. This drawing was produced 
with MolScript.*® 
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poses on the trimer of 5-carboxymethyl-2-hydroxymu- 
conate A-isomerase.'“° The tautomerase retains the qua- 
ternary structure of the common ancestor of itself and 
the isomerase, while the isomerase is an internal dupli- 
cation of its ancestor. 

The interfaces between folded polypeptides in an 
oligomeric protein are almost indistinguishable in their 
hydropathy from the interfaces between secondary 
structures within a folded polypeptide.'*' In the inter- 
faces between subunits, 65% + 4% of the accessible sur- 
face area of the complementary faces is nonpolar and 
22% + 7% is polar but uncharged. These percentages are 
indistinguishable from those for interfaces between ele- 


ments of secondary structure within a subunit (70% + 5% 
and 24% + 6%, respectively). The average interface 
between subunits, however, is enriched in its percentage 
of charged accessible surface area (13% + 5%) relative 
to the percentage of charged accessible surface area 
(6% + 2%) in the average interface between secondary 
structures within a subunit. Most of this elevation in 
charged accessible surface area is attributable to the 
guanidinium functional groups of arginines,'”' and 
arginines are the donors in 33% of all of the hydrogen 
bonds located within interfaces between subunits. 
Otherwise, the amino acid composition of an interface is 
about the same as that of the buried interior of a folded 
polypeptide. 

A representative set of ionized and un-ionized 
hydrogen bonds can be found in the interface between 
the two subunits in the dimeric carboxylesterase ESTA 
from A. fulgidus (Table 9-2). The compositions of the 
amino acids within interfaces, however, can vary widely. 
For example, each of the two identical symmetrically dis- 
played interfaces creating the dimer of 4-a-glucano- 
transferase from Thermotoga maritima is formed from 
11 side chains that are completely hydrocarbon (valines, 
leucines, isoleucines, and phenylalanines),'* while the 
interface between the two subunits of phosphopyruvate 
hydratase from E.coli, through which the molecular 
2-fold rotational axis of symmetry passes, has more than 
twice as many charged side chains as the average.“ 

The interfaces between subunits are probably ele- 
vated in their composition of charged side chains 
because the constituent subunits, once they have folded, 
have to remain soluble until they can find a complemen- 
tary partner. The charged groups prevent the folded sub- 
units from aggregating nonspecifically with other 
proteins while they are searching for a partner. 

A typical interface between two subunits of a dimer 
is that in the crystallographic molecular model of Cro 
protein from bacteriophage 434, through which the 
molecular 2-fold axis of symmetry runs (Figure 9-19). 
To the right, there is a hydrophobic cluster of symmetri- 
cally arranged prolines, valines, leucines, phenylala- 
nines, and tyrosines; to the left, a hydrogen-bonded 
cluster of glutamates, arginines, and the phenolic 
oxygen-hydrogens of the tyrosines. A significant portion 
of this polar half of the interface on the left, however, is 
formed from a hydrophobic cluster of the methylenes 
from the four arginines sandwiched between the two 
phenyl rings of two phenylalanines. Arginine is probably 
the most common charged amino acid in such interfaces 
because it provides charge to keep the folded subunit in 
solution but also has three methylenes that provide 
hydrophobic hydrogen-carbon bonds. 

Because neither hydrogen bonds nor ion pairs can 
provide favorable standard free energy of formation to an 
interface between two folded polypeptides, the standard 
free energy of formation must be provided by the 
hydrophobic effect. Although there are clusters of 


Table 9-2: Hydrogen Bonds in the Interface between the 
Subunits in the Dimer of Carboxylesterase ESTA from 
A. fulgidus" 


subunit AP subunit B length (nm) 

Asp7 OD1 Arg269 NH2 0.286 
Arg269 NH2 Asp7 OD1 0.295 
Ala247 O Lys295 NZ 0.315 
Lys295 NZ Ala247 O 0.306 
Glu274 OE1 Arg298 NH2 0.300 
Arg298 NH2 Glu274 OE1 0.29 

Ser276 OG Asp299 OD2 0.258 
Asp299 OD2 Ser276 OG 0.248 
lle277 O Arg281 N 0.287 
Arg281 N lle277 O 0.286 
Arg279 N Arg279 O 0.285 
Arg279 O Arg279 N 0.289 
Tyr280 OH Gln303 OE1 0.336 
Gln303 OE1 Tyr280 OH 0.344 


“The donor or acceptor in one subunit (subunit A) and its acceptor or donor, 
respectively, in the other subunit (subunit B) across the interface in the crystallo- 
graphic molecular model'” of the homodimer of the carboxylesterase from 
A. fulgidus are tabulated. "The atom performing the donation or the acceptance 
in each amino acid is in the notation of Figure 4-14. The crystallographic abbre- 
viations for an acyl oxygen of the backbone and an amido nitrogen of the back- 
bone are O and N, respectively. Two dimers together form the crystallographic 
asymmetric unit so the symmetrically arrayed hydrogen bonds have different 
lengths. 


hydrogen-carbon bonds throughout the interface in the 
Cro protein (Figure 9-19), such clustering is not neces- 
sary for the expression of the hydrophobic effect. All that 
is required is that hydrogen-carbon bonds be removed 
from the aqueous phase and sequestered within the 
interface during its formation. What they end up next to 
is inconsequential. 

Almost all of the interfaces between subunits in 
oligomeric proteins are those that form symmetric 
dimers. The 2-fold rotational axis of symmetry passes 
through the center of such an interface, and the detailed 
atomic contacts (Figures 9-9 and 9-19) or the interdigi- 
tations of secondary structure (Figure 9-1A) are sym- 
metrically duplicated, except at the periphery where 
side chains in contact with the water are more flexible 
(for example, the hydrogen bonding of the Arginines 43 
in Figure 9-19). The second most common interface is 
that forming a trimer with cyclic symmetry. Examples of 
the symmetrically triplicated atomic contacts and 
hydrogen bonds around a 3-fold rotational axis of sym- 
metry are found around the exact rotational axes in the 
crystallographic molecular models of dihydrolipoylly- 
sine residue acetyltransferase from A. vinelandii (Figure 
9-20)*° and the bacterial porins.!“° At the center of a 
tetramer with dihedral symmetry, where the three 
orthogonal 2-fold rotational axes of symmetry intersect, 
the atomic contacts and hydrogen bonds are also arrayed 
with the same dihedral symmetry about the point of 
intersection. “%8 

When the side chains of the twins across a rotational 
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axis of symmetry from each other intersect that axis, they 
usually assume alternate conformations in each of which 
one of the side chains sits on the axis and the other is 
pushed to one side and vice versa (Figure 6-29). In the 
interface between the two subunits in the crystallographic 
molecular model of isoenzyme 3-3 of glutathione trans- 
ferase from Rattus norvegicus, however, the stack of 
mmolecular orbitals from the two stacked guanidiniums 
of the Arginines 77 is intersected through its center by the 
molecular 2-fold rotational axis of symmetry.’ There are 
several examples in which the central sulfur-sulfur bond 
of a cystine sits upon a molecular 2-fold rotational axis of 
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presented. The exact molecular 3-fold rota- 
tional axis of symmetry, coinciding with one 
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Figure 9-20: An interface between three 
subunits arranged around a 3-fold rotational 
axis of symmetry. Portions of three of the 24 
subunits in the crystallographic molecular 
of dihydrolipoyllysine 
acetyltransferase from A. vinelandii 
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symmetry.’ In the crystallographic molecular model 
of the o dimer of human nitric-oxide synthase, a Zn** 
cation sits upon the molecular 2-fold rotational axis, sym- 
metrically bound by the two Cysteines 110 and the two 
Cysteines 115;** and in the crystallographic molecular 
model of the (œ), tetramer of methionine adenosyltrans- 
ferase from R. norvegicus, a K* cation sits upon one of the 
molecular 2-fold rotational axes of symmetry symmetri- 
cally bound by the four amido oxygens of the backbone 
from the two positions 264 and the two positions 265.” 

Most of the interfaces between subunits in 
oligomeric proteins are formed from two complemen- 
tary faces, each of which is a portion of the surface of its 
globular, folded polypeptide, and this portion is no more 
irregular than the usual exposed surface of the usual 
globular, folded polypeptide. In some instances, for 
example, glucose oxidase from Aspergillus niger, "° 
superoxide dismutase from Pseudomonas ovalis,'*’ chlo- 
ramphenicol O-acetyltransferase (Figure 9-11), and ribu- 
lose-phosphate 3-epimerase (Figure 9-14A), the two 
faces are almost flat and the resulting interface is almost 
planar. Often, however, segments of secondary structure 
or loops between secondary structures will penetrate 
superficially the subunit across the interface (Figures 
9-1A and 9-13B). 

In contrast to such classical interfaces, there are 
interfaces that are formed from regular arrays of sec- 
ondary structure. The most common examples of this 
type of interface are those in oligomers that are held 
together by coiled coils of œ helices, as are general con- 
trol protein GCN4 (Figure 6-29) and methyl-accepting 
chemotaxis protein II (Figure 6-30). The interface 
between the two subunits in the a dimer of the variant 
surface glycoprotein from Trypanosoma brucei is an 
antiparallel coiled coil of four a helices, each about 50 aa 
long; the interface between the two subunits of translo- 
cated intimin receptor is an antiparallel coiled coil of four 
o helices, each about 20 aa long;'” the interface con- 
necting the three subunits of human mannose-binding 
protein is a parallel coiled coil of three o helices, each 
20 aa long and the interface between the two œ dimers 
of the tetrameric lac repressor from E coli is an antipar- 
allel coiled coil of four o helices, one from each subunit.” 
The central cores holding together the four subunits of 
the (œ), tetramers of fumarate hydratase II from E. coli,!®? 
histidine ammonia-lyase from P. putida,‘ and adenylo- 
succinate lyase from P. aerophilum® are bundles of 20 
antiparallel o helices, five from each subunit and each 
about 30 aa long. 

Regular arrays of £ structure also are used to con- 
nect subunits of oligomeric proteins together. In the 
œ; trimer of UDP-N-acetylglucosamine diphosphorylase, 
the interfaces are formed by three identical ß helices, one 
from each subunit that run parallel to each other around 
the molecular 3-fold rotational axis of symmetry, which 
coincides with a crystallographic axis." Continuous 
B-pleated sheets run from one subunit into the other in 


the œ dimers of concanavalin A,'® alcohol dehydroge- 
nase (Figure 6-9), x bungarotoxin (Figure 9-9), and heme- 
binding protein 23 from R. norvegicus.'® A B sheet of six 
strands is orthogonally (Figures 6-33 and 6-34) and sym- 
metrically packed against an identical sheet of six 
strands from the other subunit in the a homodimer of 
the lectin from A. hypogaea,’ and a ß sheet of six strands 
is packed in parallel and symmetrically against an iden- 
tical ß sheet of six strands from the other subunit of the 
o dimer of glucose-fructose oxidoreductase from 
Zymomonas mobilis.'® In the o4 cyclic tetramer of each 
half of the dihedral octamer of dihydroneopterin aldolase 
from Staphylococcus aureus' and in the œ dimer of each 
half of the dihedral tetramer of urate oxidase from 
Aspergillus flavus,” each subunit contributes four 
p strands or eight £ strands, respectively, to the dramatic 
antiparallel p barrel of 16 strands in the center of these 
oligomers. In the pentameric rings within the coat pro- 
tein of rhinovirus 14, each of the identical subunits con- 
tributes one ß strand to the parallel $ barrel of five strands 
in the center of the oligomer.” 

Subunits in some homooligomeric proteins are joined 
to each other by structural swapping." When the subunit 
in such an oligomer is in its monomeric form, it is a com- 
pact, globular structure. When that monomer combines 
with another identical monomer, however, one or more of 
its elements of secondary structure, for example, an amino- 
terminal œ helix," a B hairpin,” two æ helices and two 
strands of ß structure,” or a structural domain,” takes the 
place of its twin on the other subunit, and its twin takes its 
place on its own subunit. Because the two elements of struc- 
ture that have swapped are identical to each other, each can 
fit precisely into the cavity vacated by the other. The result- 
ing œ dimer is held together by the respective strands of 
polypeptide connecting each swapped segment with the 
rest of its subunit. Often it is only these strands of random 
meander that hold the two subunits together and no formal 
interface is formed between them.'” Conclusive proof of 
structural swapping requires a crystallographic molecular 
model of the unswapped monomer and the swapped dimer 
so that it can be shown that the swapped segments occupy 
in the same orientation the same locations in the dimer that 
were occupied by the unswapped segments in their respec- 
tive monomers. ""17*176178 Tn the coat protein of bacterio- 
phage MS2, however, two of the three subunits in a 
homotrimeric substructure have swapped a D hairpin but 
in the third subunit that D hairpin occupies the same loca- 
tion on its own subunit occupied by the swapped ß hair- 
pins on the other two subunits.’ 

The requirements for structural swapping have 
been examined by site-directed mutation.'’”'” Site- 
directed mutation has also been used to convert an oth- 
erwise monomeric protein into a structurally swapped 
dimer.'®° 

Conclusive evidence of structural swapping is avail- 
able for only a few proteins, but there are a number of 
oligomeric proteins in which one or more segments of the 


polypeptide forming one subunit reaches over to embrace 
its neighboring subunit just as the same segments from its 
neighbor symmetrically embrace it. One example of this 
is the œ dimer of glucose-6-phosphate isomerase from 
O. cuniculus;'*' another isthe o trimer of 4-chlorobenzoyl- 
CoA dehalogenase from Pseudomonas.” In the symmet- 
ric œ dimer of human interleukin-5, the carboxy-terminal 
24 amino acids of each subunit run symmetrically down 
one side of the other in random meander and then turn to 
run across the surface of the other in an o helix.'® In the 
œ dimer of ADP-ribose diphosphatase from E coli, the 
first 57 amino acids of each subunit form a three-stranded 
antiparallel Bsheet that lies upon the surface of the 
remaining globularly folded 153 amino acids of the other 
subunit,’ and these 57 amino-terminal amino acids are 
missing in monomers from the same family. In the 
(œ) tetramer of catalase from Penicillium vitale, the first 
13 amino-terminal amino acids of one subunit are 
threaded through a large loop of 39 amino acids bulging 
out of the surface of its symmetric twin and vice versa to 
form a dimer in which each subunit is hooked to the 
other.'** 

The ultimate expression of cyclic and dihedral sym- 
metry is erythrocruorin. Erythrocruorin from Lumbricus 
terrestris is an{[(a@B)(76d)]3}12€3, oligomer of 180 folded 
polypeptides with a total of 29,916 aa arranged with 
dihedral symmetry of point group 622(D,).'® The core of 
the protein, which confers the dihedral symmetry, is a 
hexamer of dimers of trimers. Each of the 36 e subunits 
(240 aa) of this core is first assembled into a homotrimer 
with cyclic symmetry. Along the 3-fold rotational axis of 
symmetry of the trimer, the amino-terminal 50 aa of 
each subunit forms a parallel coiled coil of three a helices 
holding the three subunits together. These coiled coils 
then combine in pairs to form dimers of trimers. Six of 
these dimers of trimers assemble around a central 6-fold 
rotational axis of dihedral symmetry in a splayed array, 
the interfaces of which are formed between the coiled 
coils of œ helices. This structure displays the 12 globular 
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trimers of the carboxy-terminal domains (200 aa) of the 
e subunits directed outward in upper and lower rings of 
six. 

The globin subunits of the protein, each containing 
a heme, come in four isoforms (o, 151 aa; p, 145 aa; y, 
153 aa; and 6, 142 aa). They first pair as af dimers and 
yö dimers with cyclic pseudosymmetry of point group 
2(C;). An «ß dimer and a yö dimer then assemble around 
a molecular 2-fold rotational axis of pseudosymmetry, 
but their own local 2-fold rotational axes of pseudosym- 
metry intersect the central 2-fold rotational axis of pseu- 
dosymmetry of this tetramer at angles of 54°'®® instead of 
90°, as they would in a tetramer of dihedral pseudosym- 
metry. Nevertheless, the structure is closed not because 
of steric exclusion but because the subunits are not iden- 
tical. The 6 subunits of three copies of this asymmetric 
tetramer then associate with each other through three 
identical interfaces around a 3-fold rotational axis of 
symmetry to produce a symmetric [(a)(y6)]3 trimer of 
asymmetric tetramers. One of these trimers of tetramers 
then attaches to each of the 12 globular trimers directed 
outward from the dihedral core of e subunits to produce 
the final molecule containing 36 esubunits and 144 
globin subunits. 

The strategy employed to assemble erythrocruorin 
provides a way of assembling more than 100 folded 
polypeptides into an enormous structure by a hierarchy of 
symmetries. It is also possible to accomplish the same goal 
even more dramatically with hexagonally expanded icosa- 
hedral symmetry. 
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Problem 9-4: Make a xerographic copy of the following 
figure, reprinted with permission from ref 23, copyright 
1983 European Journal of Biochemistry. 

Using a ruler, draw all of the rotational axes of sym- 
metry on one of the two members of the stereo pairs. Use 
the abbreviations for rotational axes found in the 
International Tables for Crystallography.” 


Problem 9-5: The following diagram is based on the 
crystallographic molecular model of transketolase.” This 
drawing was produced with MolScript.*®° 


(A) How many protomers does the protein contain, 
what types of axes of symmetry does the structure 
contain, and what are their locations in the struc- 
ture? 


The following diagram is based on the crystallographic 
molecular model of glycerate dehydrogenase.'®’ This draw- 
ing was produced with MolScript.*® 


(B) How many protomers does the protein contain, 
what types of axes of symmetry does the structure 
contain, and what are their locations in the struc- 
ture? 


SS 
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Problem 9-6: The following figure is a drawing of the 
region of a crystallographic molecular model of 
glutathione synthase'®® that includes a portion of one 
of the interfaces between the protomers. This drawing 
was produced with MolScript.*” 


(A) What type of axis of symmetry runs through the 
figure? 


(B) Describe in detail the location of the axis of sym- 
metry in the portion of the structure presented in 
the figure by naming the three amino acid side 
chains in each protomer that are immediately 
adjacent to that axis of symmetry 
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Problem 9-7: A crystallographic molecular model of 
cytidine deaminase from E. coli has been constructed. 
The protein is a homodimer formed from two identical 
folded polypeptide chains, each 294 aa in length. The fol- 
lowing figure is a tracing of the «carbons from 
Glutamate 49 to Alanine 294 in one of the two folded 
polypeptides in the crystallographic molecular model.’ 
This drawing of the crystallographic molecular model 
was produced with MolScript.*” 


(A) Howmany domains are there in the portion of the 
crystallographic molecular model shown in the 
figure? 


(B) What criterion did you use to decide how many 
domains there are? 


(C) By what pseudosymmetry operation are the 
domains related to each other, and where is 
the axis of pseudosymmetry located in the figure? 


(D) How did this structure arise during evolution? 


Below is an alignment of the segment of the sequence of 
cytidine deaminase from E. coli between Glutamate 49 
and Leucine 177 with the segment of the sequence from 
Glycine 183 to Alanine 294. 


49 EDALAFALLPLAAACARTPLSNFNVGAIARGVSG 
183 GYALTGDALSQAATAAANRSHMPYSKSPSGVALECKDG 


TWY FGANMEF IGATMQOTVHAEQSAT SHAWLSGEK- -ALAAT 
RIFSGSYAENA- -AFNPTLPPLOGALTLLNLKGYDYPDIQRA 


TVN---YTPCG--HCROFMNELNSGLDLRIHLPGREAHALRD 
VLAEKADAPLIQWDATSATLKALGC----HSIDRVLLA 294 


YLPDAFGPKDLEIKTLL 177 


(E) This alignment is based on the structure shown in 
the figure above. How was this alignment per- 
formed? 


(F) Over the aligned region (Glutamate 49 to 
Histidine 155 aligned with Threonine 187 to 
Alanine 294), what is the percentage of identity 
and how many gaps are there? In calculating the 
percentage of identity, assume that the length of 
the aligned region is the average of the lengths of 
the two aligned sequences. 


The figure on the next page is a tracing of the a car- 
bons of the two subunits of cytidine deaminase from 
E. coliin the crystallographic molecular model. Again, in 
each of the two subunits the structure of the first 48 
amino acids has been left out of the tracing. Each subunit 
starts at Glutamate 49 and ends at Alanine 294, and they 
have identical sequences. This drawing was produced 
with MolScript.’® 


(G) There are three axes in the figure: horizontal, ver- 
tical, and normal. Designate correctly each of 
these three axes as 2-, 3-, 4-, or 6-fold axes of sym- 
metry or axes of pseudosymmetry. One of these 
axes is a crystallographic axis of symmetry. Which 
is it? 


(H) What is the point group of the pseudosymmetry 
and what type of oligomer usually has this type of 


symmetry? 


The amino acid sequence of cytidine deaminase 
from B. subtilis is 


MNROELITEALKARDMAYAPYSKFOVGAALLTKDGKVYRGCNIE 
NAAYSMCNCAERTALFKAVSEGDTEFQMLAVAADTPGPVSPCGA 
CROVISELCTKDVIVVLTNLOGQIKEMTVEELLPGAFSSEDLHD 
ERKL 


(D Align this sequence with the sequence of amino 
acids 49-177 ofthe protein from E. coli. The most 
conserved region in these two sequences is PCGX- 
CRQ, which contains amino acids from the active 
site of cytidine deaminase. Aligning these two 
regions from the two proteins will give you a start 
in the alignment. Put in gaps and try to get the 
best alignment. 


J) For your alignment, what is the percentage of 
identity and how many gaps are there? In calcu- 
lating the percentage of identity, assume that the 
length of the aligned region is the average of the 
lengths of the two aligned sequences. 


(K) Which are more closely related, the two 
sequences from E.coli or the sequence from 
E. coli and the sequence of cytidine deaminase 
from B. subtilis? 


(L) Does cytidine deaminase from B. subtilis contain 
a segment homologous to the segment from 
Alanine 154 to Leucine 176 in cytidine deaminase 
from E. coli? Why is this of interest in deciding 
how these two proteins evolved? 


(M) What would you guess is the quaternary structure 
of cytidine deaminase from B. subtilis ? 


Problem 9-8: Hemocyanin from Sepia officinalis is a 
decamer with dihedral symmetry of point group 
522(D,).'” The ten subunits form a ring that can be 
divided into five segments. As required by this point 
group, each of these segments is identical to the other 
four and has a 2-fold rotational axis of symmetry running 
through its center, and each segment is formed from two 
of the subunits of the protein. The portion of each sub- 
unit forming one of these segments of the cylinder is a 
folded polypeptide with six internally repeating 
domains. Consequently, the ancestral segment of the 
cylinder must have been formed from 12 identical sub- 


Isometric Oligomeric Proteins 485 


units. Arrange 12 identical subunits around local molec- 
ular rotational axes of symmetry, including the global 
2-fold rotational axis of symmetry of the dihedral point 
group, to produce a segment composing one fifth of a 
cylinder. 
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Aside from tetramers with symmetry of point group 
222(D,), protomers arranged with circular and dihedral 
symmetry form structures in which they are arranged 
cylindrically. There are three other point groups, how- 
ever, in which asymmetric protomers can be arranged to 
form oligomers that are isometric structures (Table 9-3). 
These three point groups are the only remaining ones in 
which asymmetric objects can be arranged. They are the 
tetrahedral point group 23 (T), the octahedral point 
group 432(O), and the icosahedral point group 532(D. In 
these three point groups, the centers of mass of the pro- 
tomers are arrayed systematically over the surface of a 
sphere centered on the point of intersection of the rota- 
tional axes of symmetry around which the protomers are 
positioned. 

As with proteins the subunits of which are arranged 
with either cyclic symmetry or dihedral symmetry, it is 
only the rotational axes of symmetry and their relative 
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Table 9-3: Isometric Arrangements of Identical Asymmetric Objects 


191 


rotational angles number of 
symmetry Hermann- Shönflies axes of between axes asymmetric 
Maugin symmetry“ (degrees)’ units 
tetrahedral 23 T three 2-fold 90 12 
four 3-fold 70.53 
octahedral 432 O six 2-fold 63.43 24 
four 3-fold 70.53 
three 4-fold 90 
icosahedral 532 I 15 2-fold 36 60 
10 3-fold 41.81 
six 5-fold 63.43 


“A rotational axis or symmetry is a line passing through the center of the oligomeric structure. Because it is a line, it extends in both directions 
from the center. As a result, each axis of symmetry passes out of the oligomeric structure at two opposite points, and at each of these two points 
there is a symmetric arrangement of asymmetric objects on the surface of the structure. ’These are the angles between the immediately adja- 


cent axes of the same fold. 


orientations, not the ultimate shape of a particular 
oligomer, that define its point group. Nevertheless, there 
are three regular polyhedra'” that have, respectively, 
tetrahedral, octahedral, and icosahedral symmetries and 
that illustrate the types of rotational axes of symmetry in 
each of these three point groups and their orientations in 
space. These are Kepler’s rhombic dodecahedron (Figure 
9-21A), the triangular expansion of Kepler’s rhombic 
dodecahedron (Figure 9-21B), and the triangular expan- 
sion of Kepler’s rhombic triacontahedron (Figure 
9-21C).* The Platonic solids that are the namesakes for 
each of these three point groups are the tetrahedron, the 
octahedron, and the icosahedron, respectively. The 
other two Platonic solids, the cube and the dodecahe- 
dron, have octahedral and icosahedral symmetry, 
respectively. None of the five Platonic solids, however, is 
an adequate representative of its point group because 
each of them is formed from rotationally symmetric faces 
(equilateral triangles, squares, or regular pentagons) that 
are centered on axes of symmetry. In Kepler’s three poly- 
gons all of the rotational axes of symmetry lie between 
the faces, just as in an oligomeric protein all of the rota- 
tional axes of symmetry must pass between its necessar- 
ily asymmetric protomers. 

In the tetrahedral point group 23(7), 12 identical 
protomers are arranged about four isometrically spaced 
3-fold rotational axes of symmetry (at 70.53° to each 
other) and three orthogonal 2-fold rotational axes of 
symmetry that all intersect at a common origin (Figure 
9-21A). When 12 protomers of protein are assembled in 
the tetrahedral point group 23(7), they do not fit into the 
neat geometrical boundaries of any polyhedron. 


* The unexpanded rhombic triacontahedron of Kepler, although 
constructed from 30 faces arrayed around rotational axes of 
symmetry, cannot accommodate 30 asymmetric objects. 
Consequently, it does not represent a point group. 


Figure 9-21: Regular polyhedra that have isometric symmetries:'” 
(A) the rhombic dodecahedron with tetrahedral symmetry of point 
group 23(T), (B) the tetracosahedron that is the triangular expan- 
sion of the rhombic dodecahedron and that represents octahedral 
symmetry of point group 432(O), and (C) the hexacontahedron that 
is the triangular expansion of a rhombic triacontahedron and that 
has icosahedral symmetry of point group 532(D. Kepler derived the 
rhombic triacontahedron from the intersection of the dodecahe- 
dron and octahedron by connecting the vertices at the 5-fold and 
3-fold axes of symmetry with lines. In each of the figures, rotational 
axes of symmetry on the circumference are labeled with the 
number of their fold. 


Nevertheless, they are arrayed around the proper 
2-fold and 3-fold rotational axes of symmetry. 
3-Dehydroquinate dehydratase from Mycobacterium 
tuberculosis (Figure 9-22)"” is a protein in which the 12 
folded polypeptides, each 146 aa long, are arranged with 
tetrahedral symmetry. In this particular protein, the 
most extensive interfaces are those producing the four 
trimers, each of which sits on its own 3-fold rotational 
axis of symmetry. As with the 2-fold rotational axes of 
symmetry in the dihedral point groups of odd fold, the 
two ends of each of the 3-fold axis of symmetry in point 
group 23(T) are different. Each of the four trimers in 
3-dehydroquinate dehydratase is at one end of a 3-fold 
rotational axes of symmetry, while the other end of each 
3-fold rotational axis of symmetry is surrounded by the 
other three trimers. The monomers in a trimer are super- 
posed by one end of a 3-fold rotational axis of symmetry, 
while the other three trimers are superposed upon each 
other by the other end of that same axis. 

Several of the proteins with tetrahedral symmetry 
are such tetramers of trimers. In dilute solutions of 
guanidinium chloride, 3-dehydroquinate dehydratase 
dissociates into its constituent trimers.!” Phaseolin from 
P. vulgaris is a trimeric protein!” that associates to form 
a dodecamer'” with tetrahedral symmetry’? below 
pH 4.5. The self-rotation function of the asymmetric unit 
of crystals of catabolic ornithine carbamoyltransferase 
from Pseudomonas aeruginosa has the maxima consis- 
tent with an oligomer with tetrahedral symmetry,'” and 
when the protein is cross-linked the major covalent 
species are trimer, hexamer, nonamer, and dodecamer, a 
result consistent with the trimer being the fundamental 
unit.” 

In contrast to these three dodecamers, the dode- 
camer of protocatechuate 3,4-dioxygenase from 
P. aeruginosa is a structure in which six dimers, each 
formed by an extensive interface centered on a 2-fold 
rotational axis of symmetry, are arrayed around the 
3-fold axes of symmetry of its tetrahedral point group by 
less extensive interfaces. The interfaces forming the 
constituent dimers of bromoperoxidase from Corallina 
officinalis are also more extensive (105 nm? interface”) 
than those forming the tetrahedral dodecamer (32 nm? 
interface") from six of those dimers.'”® Consequently, 
there are both tetramers of trimers and hexamers of 
dimers with tetrahedral symmetry. 

In the octahedral point group 432(0), the 24 iden- 
tical asymmetric objects of the tetracosamer are 
arranged about three orthogonal 4-fold rotational axes of 
symmetry, four isometrically spaced 3-fold rotational 
axes of symmetry at 70.53° to each other, and six isomet- 
rically spaced 2-fold rotational axes of symmetry at 
63.43° to each other (Figure 9-21B). Again, the 24 pro- 
tomers of a protein with octahedral symmetry only have 
to be arranged about the rotational axes of symmetry; 
they do not have to assume any particular geometric 
shape. As long as the interfaces among the protomers 
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Figure 9-22: Tetrahedral symmetry of point 
group 23(T). An a-carbon diagram of 


3-dehydroquinate dehydratase from M. tuber- 
culosis is drawn from the crystallographic 


molecular model.'® The protein crystallized 
in the space group F23, and an individual 
subunit of 146 aa is the crystallographic 


asymmetric unit. Consequently, all rota- 
the four trimers is drawn with line segments 


of different thickness or shading, and each 
is drawn in the orientation of the rhombic 


dodecahedron in Figure 9-21A. This drawing 


subunit is labeled with a letter. The molecule 
was produced with MolScript.’® 


tional axes of symmetry are exact. Each of 


position them in space around those rotational axes of 
symmetry, the oligomer will be a closed, octahedral, iso- 
metric structure. In point group 432(O), unlike in point 
group 23(T), identical sets of interfaces are found at the 
two ends of each of the rotational axes of symmetry. 
Heat shock protein 16.5 from Methanococcus jan- 
naschii (Figure 9-23)' is a tetracosameric protein in 
which subunits of 147 aa are arranged with octahedral 
symmetry to form a hollow spherical shell that has size- 
able holes at the 3-fold rotational axes of symmetry and 
at the 4-fold rotational axes of symmetry. The most 
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extensive interfaces are those between monomers paired 
as dimers around the 2-fold rotational axes of symmetry. 
The second most extensive set of interfaces are the 
groups of four dimers arrayed around each of the 4-fold 
rotational axes of symmetry, so the structure is a trimer 
of tetramers of dimers. The four symmetrically arrayed 
interfaces among the four monomers around a 4-fold 
rotational axis of symmetry, one from each dimer, incline 
those four dimers relative to each other to form a jagged 
cup with the proper curvature so that three of these cups 


fit together to form the final spherical structure. 
Coincidentally, the protein crystallizes in the space 
group H3 with one of these octameric cups as the asym- 
metric unit. 

In the icosahedral point group 532(J), 60 identical 
asymmetric objects are arranged about 31 rotational axes 
of symmetry (Figure 9-21C; Table 9-3) to produce the 
oligomer. Each of the 31 rotational axes of symmetry 
intersects all of the others at the center of the structure. 
In every icosahedral arrangement, the relative angular 
dispositions of these axes is always the same. Although 
there are 31 rotational axes of symmetry in this point 
group, it can be generated from one protomer by four 
successive rotations around specified axes in a given 
sequence. 6,7-Dimethyl-8-ribityllumazine synthase from 
B. subtilis“ and the protein coat of satellite panicum 
mosaic virus (Figure 9-24)" are oligomeric proteins in 
each of which 60 identical subunits are arranged with 
icosahedral symmetry of point group 532(J to form a 
spherical shell. 

The protein coat of a virus such as satellite pan- 
icum mosaic virus is a thick, continuous, spherical layer 
of protein that serves the purpose of enclosing and pro- 
tecting the viral nucleic acid. This nucleic acid encodes 
the genetic information necessary for the virus to control 
the host parasitically and divert the purpose of the host 
from its own growth and replication to the growth and 
replication of the virus. These requirements can be satis- 
fied only by a fairly large molecule of nucleic acid, and it 
all must fit within the protein coat so that it can be pro- 
tected from the environment. A viral protein coat is made 
from 60 identical, or almost identical,’”* protomers. In 
a spherical virus, these protomers are arranged about the 
icosahedral rotational axes of symmetry to produce 
spherical shells that can enclose the nucleic acid. In the 
case of satellite panicum virus, the protomer is a single 
folded polypeptide. 

Again, it is not any regular polygon that defines a 
protein built from 60 protomers arranged with icosahe- 
dral symmetry but the rotational axes of symmetry. 
Regardless of how intertwined or encroaching the pro- 
tomers in such a protein become, the number and rela- 
tive angular dispositions of the rotational axes of 
symmetry that dictate the positions of those protomers 
are permanent features of the structure. If 60 identical 
folded polypeptides are arranged around these rota- 
tional axes of symmetry and their shapes are so con- 
structed as to mesh symmetrically at each of the 
boundaries among themselves, they will necessarily 
form a tightly sealed icosahedral shell. 

The interfaces among the protomers are the fun- 
damental determinants of the multimeric structure. In 
6,7-dimethyl-8-ribityllumazine synthase, the pentamer 
appears to have been the unit assembled by evolution 
into the oligomer of 60 subunits.” At each of the five 
equivalent outer edges of this pentamer an interface 
evolved, connecting the pentamer to another identical 


pentamer in the usual way that two identical proteins are 
associated, which is around a 2-fold rotational axis of 
symmetry. This particular 2-fold rotational axis of sym- 
metry, however, happened to incline the two pentamers 
with respect to each other so that their respective 5-fold 
rotational axes of symmetry both intersected the 2-fold 
axis of symmetry and formed the angle required to exist 
between the 5-fold rotational axes of symmetry in an 
icosahedron, which is 63°. If two interfaces, the one 
defining the pentamer and the other at the 2-fold rota- 
tional axis defining an angle of 63°, are built into faces on 
a protomer, 60 such protomers will automatically assem- 
ble into an icosahedral shell. 

In the protein coat of satellite panicum mosaic 
virus, the interfaces holding the trimers together around 
the 3-fold axes of symmetry (15.7 nm? interface”) are 
more extensive than those holding together the dimers 
around the 2-fold axes (12.8 nm? interface”) or the pen- 
tamers around the 5-fold axes (10.5 nm? interface™).°”' If 
the trimer was the fundamental unit from which the pro- 
tein coat arose, the evolution of two complementary 
faces on the trimer that formed a dimer of trimers inclin- 
ing the two 3-fold rotational axes of symmetry at 42° 
would also automatically create the entire icosahedrally 
symmetric oligomer of 60 subunits. 

Vertebrate ferritin is a tetracosamer the identical 
subunits of which are arranged with octahedral symme- 
try (Figure 9-25A), 0 but ferritin from Listeria innocua 
is a dodecamer the identical subunits of which are 
arranged with tetrahedral symmetry.” These are exam- 
ples of two different quaternary structures for the same 
species of protein. The subunits of vertebrate ferritin 
and the ferritin from L. innocua are both antiparallel 
coiled coils of four ahelices (Figure 9-25B) that are 
superposable on each other and consequently homolo- 
gous to each other. In both proteins, these coiled coils of 
a helices associate side by side in opposite directions to 
form dimers around 2-fold rotational axes of symmetry, 
and the interfaces forming the dimers from the two pro- 
teins are homologous to each other. The dimers, in turn, 
in each protein combine in triplets around 3-fold rota- 
tional axes of symmetry in which the three identical 
interfaces around each axis are formed by the two 
respective ends of the monomers in one dimer butting 
up against the side of one of the monomers in a neigh- 
boring dimer (Figure 9-25C). These interfaces around 
the 3-fold axes in the two proteins are homologous to 
each other. Vertebrate ferritin is formed from four of 
these trimers of dimers; the ferritin from L. innocua, from 
two. As with any oligomer with tetrahedral symmetry 
(Figure 9-22), there are two different types of interfaces 
at the two ends of each 3-fold axis in the tetrahedral 
dodecamer from L. innocua. One of these types is the set 
homologous to the set of interfaces around the 3-fold 
axes in vertebrate ferritin (Figure 9-25C). 

If one of the dimers around a 4-fold rotational axis 
of symmetry in vertebrate ferritin were removed (for 
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example, the one containing monomer V) and the 
creases between the remaining three dimers were made 
more acute, the space previously occupied by the miss- 
ing dimer would close up and a new interface could form 
(one between dimer I/II and the side of monomer IV) 
now identical to the other two around the once 4-fold but 
now 3-fold axis of symmetry. If this transformation were 
performed at each end of each 4-fold axis in vertebrate 
ferritin, six dimers would be removed, the structure 
would be converted from a tetracosamer with octahedral 


symmetry. 


icosahedral 


Figure 9-24: Icosahedral symmetry of point 
group 532(D. An a-carbon diagram of one 
hemisphere of the protein coat of satellite 
panicum mosaic virus is drawn from the 
crystallographic molecular model of the pro- 
rein TI The protein crystallized in the space 
group P4,32 with a pentamer of subunits, 
each of 157 aa, in the crystallographic asym- 
metric unit, the smallest asymmetric unit 
to 

Individual subunits have been drawn with 
line segments with one of three different 
thicknesses or shadings. Only 34 of the 60 
subunits are drawn to make the structure 
easier to visualize. These 34 subunits form 
the front of the sphere that is the protein 
coat. The portion of the molecule drawn is in 
the orientation of the hexacontahedron of 
Figure 9-21C. This drawing was produced 
with MolScript."® 


available 


Figure 9-25: Representations of the octahedral symmetry of 
mammalian ferritin.” (A) Ribbon diagram of only the front half 
of the crystallographic molecular model of the entire spherical 
molecule showing 12 of the 24 subunits. Reprinted with permission 
from ref 206. Copyright 1997 Elsevier B.V. (B) Ribbon diagram of 
just the central of the 12 subunits portrayed in panel A. Reprinted 
with permission from ref 206. Copyright 1997 Elsevier B.V. 
(C) Representation of the arrangement of the dimers of the sub- 
units of ferritin on the faces of a rhombic dodecahedron (Figure 
9-21A). Arrows are drawn to indicate the antiparallel orientations 
of the two subunits around the 2-fold rotational axes of symmetry 
in the center of each face. The 2-fold, 3-fold, and 4-fold rotational 
axes of symmetry are located as they are in Figure 9-21B. 


symmetry into a dodecamer with tetrahedral symmetry, 
and this new dodecamer would be structurally homolo- 
gous to the ferritin from L. innocua. The two different 
ends of each of the new 3-fold rotational axes of symme- 
try in the new tetrahedral dodecamer would be the end 
not affected by the transformation and the end that had 
been at a 4-fold rotational axis of symmetry before the 
transformation. Consequently, in these two different 
quaternary structures of ferritin, the octahedral and the 
tetrahedral, the homologous complementary faces on 
two different, but homologous, dimers must accom- 
modate in the one case a 4-fold rotational axis of sym- 
metry and in the other case a 3-fold rotational axis of 
symmetry. 

Among the small heat shock proteins, there are also 
two different quaternary structures for the same species 
of protein in different species of organisms. The small 


heat shock protein from M. jannaschii (Figure 9-23) is a 
tetracosamer with octahedral symmetry of point group 
432(O), while the same protein from Triticum aestivum is 
a dodecamer with dihedral symmetry of point group 
322 (Ds) 208 In both structures, a dimer is the fundamental 
unit, and the respective dimers have homologous tertiary 
and quaternary structures. In the protein from M. jan- 
naschii, the molecular 2-fold rotational axis of symmetry 
of each dimer is coincident with one of the 2-fold rota- 
tional axes of the symmetry of point group 432(0), while 
in the protein from T. aestivum, the entire dimer is the 
asymmetric unit for the symmetry of point group 322(D;) 
and its 2-fold rotational axis is local and pseudosymmet- 
ric. Nevertheless, the way in which the three dimers are 
arranged around the single, central 3-fold rotational axis 
of symmetry in the latter is the same as that in which 
three dimers are arranged about one of the four 3-fold 
rotational axes of symmetry in the former (Figure 9-23), 
and the fundamental unit in the two different structures 
is this homologous hexamer. Two of these hexamers lie 
back to back in the dodecamer, and four of these hexam- 
ers are each centered on the respective 3-fold rotational 
axes of symmetry in the tetracosamer.*” Presumably, 
because this hexamer is the fundamental unit, the dode- 
camer is one of symmetry of point group 322(D;) rather 
than that of point group 622(D,) as are most other dode- 
camers. 

Dihydrolipoyllysine residue acetyltransferase, dihy- 
drolipoyllysine residue succinyltransferase, and dihy- 
drolipoyllysine residue (2-methylpropanoyl)transferase 
are closely related proteins, the sequences of which can 
be readily aligned with each other with greater than 30% 
identity.” All of the dihydrolipoyllysine residue 
succinyltransferases and dihydrolipoyllysine residue 
(2-methylpropanoyl)transferases and dihydrolipoylly- 
sine residue acetyltransferases from Gram-negative 
bacteria are tetracosamers of 24 identical subunits 
with octahedral symmetry,”'”*” while the dihydrolipoyl- 
lysine-residue acetyltransferases from Gram-positive 
bacteria and eukaryotes are hexacontamers of 60 identi- 
cal subunits with icosahedral symmetry." ®?® 

In both the octahedral oligomers” and the icosa- 
hedral oligomers,” the homologous interfaces forming 
the trimers centered on the respective 3-fold rotational 
axes of symmetry (Figure 9-21A,B) are the most extensive 
in the structures. In the octahedral oligomers, the eight 
trimers at the eight 3-fold vertices are connected by 12 
interfaces centered on the 2-fold rotational axes of sym- 
metry, so that the structure is a cube with wide openings 
in each face at the 4-fold rotational axes of symmetry.” 
In the icosahedral oligomers, the trimers are again joined 
at the 2-fold rotational axes of symmetry but by 30 inter- 
faces, and the structure is a dodecahedron with wide 
openings in each face at the 5-fold rotational axes of sym- 
metry. In the octahedral oligomers, four trimers associ- 
ate with each other around 4-fold rotational axes of 
symmetry, while in the icosahedral oligomers, five 


trimers associate with each other around 5-fold rota- 
tional axes of symmetry. The respective interfaces at the 
2-fold rotational axes of symmetry between these homol- 
ogous trimers that are arranged around these two differ- 
ent rotational axes of symmetry adapt flexibly to the 
dispositions of the trimers that are required by those 
axes.”!° In these two different quaternary structures of 
these homologous acetyltransferases, the octahedral and 
the icosahedral, the homologous complementary faces 
on two different, but homologous, trimers must accom- 
modate in the one case a 4-fold rotational axis of sym- 
metry and in the other case a 5-fold rotational axis of 
symmetry. 

Ferritin, small heat shock protein, and dihydro- 
lipoyllysine-residue acetyltransferase, because they each 
are examples of the same protein having different qua- 
ternary structures, reinforce the conclusion that quater- 
nary structure contains no information relevant to the 
evolution of proteins. The examples of ferritin and dihy- 
drolipoyllysine-residue acetyltransferase also illustrate 
the ability of two different subunits, formed respectively 
from closely related polypeptides, to assume similar 
arrangements around rotational axes of symmetry of 
different fold. It is this ability of interfaces to adjust flex- 
ibly to different rotational axes of symmetry that has 
been exploited by evolution to increase the size of viral 
protein coats. 

Although there are a few other viruses like satellite 
panicum mosaic virus that have protein coats assembled 
from only 60 subunits arranged with 532(D symme- 
try, 09425 the problem with such protein coats is that 
they are too small. In order to enclose enough DNA to 
accomplish a successful subversion of the host, the pro- 
tein coats usually must be larger. In two instances, this 
problem has been solved by using an elongated homo- 
dimer as a protomer and having these homodimers 
arrayed as spokes around the 5-fold rotational axes of 
symmetry. One monomer of the dimer forms the near 
end of the spoke adjacent to the 5-fold axis; and the other 
monomer, the far end of the spoke. A local 2-fold rota- 
tional axis of pseudosymmetry relating the two elon- 
gated monomers is located in the center of the spoke. 
The interdigitation of these long spokes forms the pro- 
tein coat. Although each is formed from copies of the 
same polypeptide, a monomer at the 5-fold hub is 
required to assume a significantly different shape from a 
monomer at the periphery of the spoke in order for the 
interdigitation to succeed and the global 3-fold and 
2-fold rotational axes of icosahedral symmetry to be 
satisfied.”!®?! Most viral protein coats, however, are built 
with a different strategy that takes advantage of quasi- 
equivalence and pseudosymmetry to provide a general 
solution to the problem of expanding the size of an icosa- 
hedral bell "7" 

Quasi-equivalence’’® is the manifestation of the 
ability of either two or more copies of the same subunit 
or two or more homologous subunits to adapt flexibly in 
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different situations to the requirements of two rotational 
axes of symmetry of different fold. The respective homol- 
ogous subunits in the two different quaternary structures 
of ferritin or in the two different quaternary structures of 
dihydrolipoyllysine-residue acetyltransferase are quasi- 
equivalent to each other, but because in each case they 
have different amino acid sequences, it is the differences 
in amino acid sequence that might explain their abilities 
to assume the different dispositions. Furthermore, the 
two rotational axes of symmetry of different fold to which 
they adapt are in different oligomers. In many viral pro- 
tein coats, however, the several quasi-equivalent sub- 
units are formed from folded polypeptides of the same 
sequence, and the quasi-equivalent subunits are found 
together in the same icosahedral shell. 

To understand the relationships of such quasi-equiv- 
alent subunits to each other in such an oligomer, the local 
rotational axes within a protomer must be distinguished 
from the global rotational axes of icosahedral symmetry 
governing the entire structure. A global rotational axis of 
symmetry is an axis of symmetry around which a rotation 
of 360°/n causes the entire oligomer to superpose upon 
itself. A global rotational axis is thus distinguished from a 
local rotational axis that operates only on structural units 
in its immediate vicinity. 

The triangle from which the expanded rhombic tri- 
acontahedron (Figure 9-21C) is constructed, although 
formally an asymmetric object because two of its vertices 
lie at global 3-fold rotational axes of symmetry and one 
vertex lies at a global 5-fold rotational axes of symmetry 
and only one of its three edges lies on a global 2-fold rota- 
tional axis of symmetry, is nevertheless an equilateral tri- 
angle and locally symmetric. Were the equivalent mass of 
three identical folded polypeptides, related to each other 
by this local 3-fold rotational axis of pseudosymmetry, 
to fill this triangle, it would have three times more area 
than if the equivalent mass of only one folded polypep- 
tide filled it, and the shell could then contain 3.5 times 
more nucleic acid. This solution requires (Figure 9-21C) 
that this folded polypeptide be capable of quasi-equiva- 
lence because those subunits forming the interfaces 
around one vertex of the triangle would have to be 
arrayed around a global 5-fold rotational axis of symme- 
try (72° for each step; complementary faces at 108°), 
while those subunits forming the interfaces around the 
other two vertices would have to be arrayed around local 
6-fold rotational axes of pseudosymmetry (60° for each 
step; complementary faces at 120°) that each coincide 
with a global 3-fold rotational axis of symmetry. The 
requirements of this quasi-equivalence would force each 
of the three subunits arrayed around the rotational axis 
in the center of each triangle to assume a significantly 
different conformation, so the local 3-fold rotational axis 
around which they are arrayed is one of pseudosymme- 
try. 

Quasi-equivalent subunits arrayed around rota- 
tional axes of pseudosymmetry cannot each be individ- 
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ual protomers, but the set of all of the quasi-equivalent 
subunits arrayed around a rotational axis of pseudosym- 
metry or several rotational axes of pseudosymmetry can 
be a protomer of the overall quaternary structure. 
Consequently, it is the three quasi-equivalent subunits 
arrayed around the local 3-fold rotational axis of 
pseudosymmetry that would form the protomer of the 
icosahedral array. The global rotational axes of this icosa- 
hedral array would remain true global rotational axes of 
symmetry for the entire structure because the asymme- 
try would be confined entirely within each of the pro- 
tomers of the point group. 

Tomato bushy stunt virus has a protein coat with 
just such an arrangement of subunits (Figure 9-26).””° 
Each of its 60 identical pseudosymmetric, trimeric pro- 
tomers is formed from three folded polypeptides, sub- 
units A, B, and C, each of the same sequence 386 aa in 
length and the tertiary structures of which, when they are 
in the viral protein coat, are homologous and superpos- 
able. The differences in their respective conformations 
permit each of them to play its required quasi-equivalent 


Figure 9-26: Arrangement of the 180 subunits in the protein coat 
of tomato bushy stunt virus.” Each tile is a single folded polypep- 
tide, and all of the polypeptides are identical to each other in amino 
acid sequence. Three folded polypeptides, designated A, B, and C, 
are arrayed around a local 3-fold axis of pseudosymmetry to pro- 
duce the trimeric protomer of the icosahedral array. The vertex 
occupied by the A subunit lies at a global icosahedral 5-fold rota- 
tional axis of symmetry, and the axes occupied by the B and 
C subunits lie at global icosahedral 3-fold rotational axes of sym- 
metry that are also local 6-fold rotational axes of pseudosymmetry. 
Global 2-fold rotational axes of symmetry relate C subunits, and 
local 2-fold rotational axes of pseudosymmetry relate A and 
B subunits. Each subunit has a protrusion that runs up its associ- 
ated 2-fold rotational axis. The diagram was adapted from the crys- 
tallographic molecular model of this viral protein coat. Reprinted 
with permission from ref 220. Copyright 1983 Academic Press. 


role. For example, only subunits C are adjacent to the 
global 2-fold rotational axes of symmetry. Subunits B and 
C must alternate around the global 3-fold rotational axes 
of symmetry to produce local 6-fold rotational axes of 
pseudosymmetry, while subunitsA are distributed 
around the global 5-fold rotational axis of symmetry. 
These quasi-equivalent situations produce alterations in 
the structures of these folded polypeptides that are most 
obvious at the interfaces among the homotrimers; it is 
here that the strain of requiring the same protein to 
adapt to the different rotational axes of symmetry is the 
strongest. 

The packing at the quasi-equivalent interfaces has 
been described in detail for the protein coat of southern 
bean mosaic virus,” which is closely related to tomato 
bushy stunt virus. The three identical but quasi-equiva- 
lently folded polypeptides, each 260 aa in length, are 
arranged around a local 3-fold rotational axis of 
pseudosymmetry (Figure 9-27A)” to create the 
homotrimeric protomer in which the three subunits 
adapt to their respective quasi-equivalent environments. 
The A subunits are arranged around the global icosahe- 
dral 5-fold rotational axes of symmetry (Figures 9-21C, 
9-26, and 9-27B). Each of the B and C subunits uses the 
same vertex to form the local 6-fold rotational axis of 
pseudosymmetry (Figures 9-26 and 9-27C) as that used 
by the Asubunit to conform to the global 5-fold rota- 
tional axis of symmetry. Careful inspection of Figure 
9-27B,C shows that the two unique defining interfaces 
are similar but significantly adjusted to accommodate 
the differences in the angular requirements around these 
two axes. These adjustments, in turn, create conforma- 
tional changes throughout each of the individual sub- 
units, causing their overall structures to differ. These 
differences are most obvious in the angular orientations 
of both the pleats within the ßsheets and the ß sheets 
themselves when the three different conformations are 
compared. 

The protein coats of tomato bushy stunt virus, 
southern bean mosaic virus, turnip yellow mosaic 
virus, black beetle nodavirus,” and primate cal- 
civirus,”** among others, are all constructed from 180 
identical subunits distributed among 60 identical 
homotrimers. Because, however, the three subunits in a 
trimer (A, B, and Cin Figure 9-26) are notin identical envi- 
ronments, they need not be identical in amino acid 
sequence. Insome icosahedral protein coats with trimeric 
protomers, gene triplication of the nucleic acid encod- 
ing the amino acid sequence of the protein forming the 
coat has occurred to produce three genes. The protein 
coats of the comoviruses, of which those of cowpea 
mosaic virus and beanpod mottle virus are examples, are 
an interesting intermediate case in this process.” In the 
protein coat of these viruses, two of the subunits in the 
heterotrimeric protomer are internally repeating 
domains on the same polypeptide. This suggests that the 
general sequence of events has been a gene duplication 


Figure 9-27: a-Carbon diagrams of the folded polypeptides compos- 
ing the protein coat of southern bean mosaic virus drawn from the 
crystallographic molecular model of the entire viral protein coat.” 
The entire crystallographic molecular model has 180 identical 
polypeptides all folded into the same tertiary structure. There are 
three different environments, however, in which the folded polypep- 
tides are found that can be designated A, B, and C (Figure 9-26). 
(A) Three folded polypeptides arrayed about the 3-fold rotational axis 
of pseudosymmetry at position A (at 12 o’clock), position B (at 4 
o'clock), and position C (at 8 o’clock), respectively. (B) Two folded 
polypeptides arrayed about the global 5-fold rotational axis of icosa- 
hedral symmetry both in positions A. The lines drawn through the 
centers of the two subunits meet at an angle of 72° on the global 
5-fold rotational axis of symmetry. Because this is an axis of symme- 
try, the two subunits are exactly superposable, and all interfaces 
around this axis are identical. (C) Two folded polypeptides arrayed 
about the global 3-fold rotational axis of icosahedral symmetry, which 
is a local 6-fold rotational axis of pseudosymmetry, at position C 
(2 o’clock) and position B (at 4 o’clock). The lines drawn through the 
centers of the two subunits meet at an angle of 60° on the local 6-fold 
rotational axis of pseudosymmetry. The conformations of the B and 
C subunits are noticeably different because they must flexibly adjust 
to different dispositions within the overall icosahedral array. In both 
panels B and C, the lines meeting at the rotational axes of symmetry 
pass through the «carbon of Threonine 198 and then through the 
respective subunit along the same trajectory. These drawings were 
produced with MolScript."® 
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that gave rise to two genes producing two separate 
polypeptides followed by a gene duplication of one of 
these genes producing a single polypeptide with two 
internally repeating domains followed by a division of the 
latter duplicated gene so that it then produced two 
smaller polypeptides, each containing the complete fold 
of its ancestral polypeptide. Following the triplication, 
each of the three genes evolved independently to produce 
three polypeptides, each of different sequence and each 
presumably incorporating changes rendering it more 
successful at occupying its respective quasi-equivalent 
position in the viral protein coat. 

There are crystallographic molecular models for 
icosahedral protein coats of four viruses that have 
accomplished the complete evolutionary transition: rhi- 
novirus,”® poliovirus,’ Mengo virus,” and foot-and- 
mouth disease virus.”” The protein coats of these viruses 
are spherical shells (Figure 9-28A,B),”” the surfaces of 
which are paved with heterotrimeric protomers in 
icosahedral array (Figure 9-28C).””* That the three differ- 
ent folded polypeptides forming these four viral protein 
coats have arisen from a gene triplication follows from 
the fact that, in each case, the three different polypep- 
tides folded in their native conformations are superpos- 


able DZ If this is a definitive correspondence, it 
demonstrates that the ancestors of each of these protein 
coats were constructed from 180 folded polypeptides of 
identical sequence. 

It has been shown that the single folded polypep- 
tides composing the protomers of the protein coats of 
satellite panicum mosaic virus and satellite tobacco 
necrosis virus, another small virus the protein coat of 
which has only 60 subunits,”’*”*’ are superposable on any 
one of the folded polypeptides forming the trimeric pro- 
tomer of the protein coat of either tomato bushy stunt 
virus or southern bean mosaic virus.””’' It has also been 
pointed out that, even though the former have 60 sub- 
units and the latter have 180 subunits, the packing of the 
subunits of the protein coat of satellite panicum mosaic 
virus and satellite tobacco necrosis virus is similar to the 
packing of the subunits of the protein coats of both south- 
ern bean mosaic virus and tomato bushy stunt virus. 
When the global 3-fold rotational axis of symmetry in the 
protein coat of satellite tobacco necrosis virus is aligned 
with the local 3-fold rotational axis of pseudosymmetry 
within one of the trimeric protomers of the protein coat 
from one of the other viruses (Figure 9-29),**° the three 
global 5-fold rotational axes of symmetry in the protein 
coat of satellite tobacco necrosis virus coincide with one 
of the global 5-fold rotational axes of symmetry and two 
of the local 6-fold rotational axes of pseudosymmetry in 
the protein coat of the other virus. 

It has also been observed?” that when the first 61 
amino acids of the protein coat of southern bean mosaic 
virus are removed, the remainder of the folded polypep- 
tide can assemble to produce a hollow icosahedral shell 


Figure 9-28: Space-filling representations of the folded polypeptides assembled into the icosahedral protein coat of poliovirus as drawn 
from the crystallographic molecular model of this oligomeric protein.””’ (A) View into the central cavity of the viral protein coat into which 
the viral RNA is packed. (B) View of the surface of the viral protein coat in which the atoms contributed by each of the three different types 
of folded polypeptides, VP1, VP2, and VP3, have been given different shades of gray. Panels A and B reprinted with permission from ref 227. 
Copyright 1985 American Association for the Advancement of Science. (C) Diagrammatic representation of the surface of an icosahedral viral 


protein coat” 


in the same orientation as panel B to illustrate the distribution of the various folded polypeptides around the rotational axes 


of icosahedral symmetry and local pseudosymmetry. Panel C reprinted with permission from ref 228. Copyright 1987 American Association 


for the Advancement of Science. 


Figure 9-29: Comparison of the packing of the respective folded 
polypeptides in the protein coat of satellite tobacco necrosis virus 
(unbroken lines) and the protein coat of tomato bushy stunt virus 
or southern bean mosaic virus (broken lines).”° The former viral 
protein coat has 60 folded polypeptides in exact icosahedral sym- 
metry; the latter have 180 folded polypeptides in T=3 icosahedral 
symmetry. The exact global 3-fold rotational axis of icosahedral 
symmetry for satellite tobacco necrosis virus is aligned with the 
local 3-fold rotational axis of pseudosymmetry (both designated as 
3*) in the center of each of the protomers of the other two. This 
alignment causes the global 5-fold rotational axes of icosahedral 
symmetry at the bottom of the diagram (designated as 5) to coin- 
cide and concurrently causes the two other global 5-fold rotational 
axes of symmetry at the other two vertices of the three protomers 
from satellite tobacco necrosis virus to coincide with the two local 
6-fold rotational axes of pseudosymmetry (all of these axes desig- 
nated as 5*) at the other two vertices of the protomer from tomato 
bushy stunt virus or southern bean mosaic virus. Reprinted with 
permission from ref 230. Copyright 1982 Academic Press. 


containing only 60 subunits instead of the usual 180. In 
this new oligomeric protein, the original arrangements 
around the global 5-fold rotational axes of symmetry 
have been retained and the local 3-fold rotational axes of 
pseudosymmetry in the centers of the protomers of the 
original structure have become the global 3-fold rota- 
tional axes of symmetry in the new structure (as depicted 
in Figure 9-29). This latter result demonstrates that these 
two icosahedral structures, the pseudosymmetric with 
180 quasi-equivalent polypeptides and the symmetric 
with 60 polypeptides, are readily interconverted. 

All of these results taken together indicate that the 
protein coats of these four viruses, tomato bushy stunt, 
southern bean mosaic, satellite panicum mosaic, and 
satellite tobacco necrosis, share a common ancestor.”* 
The fact that satellite tobacco necrosis virus and satellite 
panicum mosaic virus are parasites on other viruses and 
the fact that viral protein coats of their size cannot carry 
enough nucleic acid suggests that their ancestors origi- 
nally had a larger protein coat built from 60 
homotrimeric protomers. 
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The symmetric and pseudosymmetric icosahedral 
protein coats of satellite tobacco necrosis, satellite pan- 
icum mosaic, tomato bushy stunt, southern bean 
mosaic, cowpea mosaic, beanpod mottle, Mengo, and 
foot-and-mouth disease viruses, black beetle nodavirus, 
poliovirus, and rhinovirus are all those of viruses carry- 
ing single-stranded ribonucleic acids that have positive 
copies of the viral messenger ribonucleic acids. These 
viruses infect eukaryotic cells, both animals and plants. 
From the descriptions that have just been given of the 
crystallographic molecular models of the protein coats of 
these viruses, it follows that the ancestors of all of these 
eukaryotic positive-strand RNA viruses had icosahedral 
protein coats constructed from 60 homotrimeric pro- 
tomers, each of which was constructed from three folded 
polypeptides of identical sequence arranged about a 
local 3-fold rotational axis of pseudosymmetry. The 
most remarkable discovery, however, is that all of the 
folded polypeptides comprising the protein coats of all 
of these eukaryotic positive-strand RNA viruses, 
whether their hosts are plants or animals, are super- 
posable.*°'?*325-2931 Furthermore, the single polypep- 
tides forming the protein coats of 60 identical subunits 
enclosing the single-stranded DNA of the bacterio- 
phage ¢X174° and the single-stranded DNA of canine 
parvovirus”” also have folds superposable on those of 
these viral protein coats that enclose single-stranded 
RNA. These similarities among all of these various pro- 
teins suggest that, unless convergent evolution has 
occurred, all of the viral protein coats they respectively 
compose share one common ancestor.* 

This remarkable possibility, if it is true, would not 
be hard to explain. Three unique faces, each creating an 
independent set of repeating interfaces among the 180 
identical folded polypeptides, would have had to evolve 
on the surface of the same monomer. The three unique 
interfaces produced by these three unique faces are the 
interface responsible for the local 3-fold rotational axis of 
pseudosymmetry within the protomer (Figure 9-27A), 
the interface responsible for the global 5-fold rotational 
axis of symmetry at one vertex of each subunit, and the 
interface responsible for the global 2-fold rotational axis 
of symmetry that orients the two pentagons it connects. 
Proteins containing molecular 5-fold rotational axes of 
symmetry are rare products of evolution (Table 9-1) and 
the 2-fold rotational axis of symmetry, though common, 
would be required to be located at a particular disposi- 
tion relative to the 5-fold axis of symmetry. These con- 
straints are probably not so rigid as they seem. If the 


* Each of these polypeptides is folded into a 10-stranded jelly roll, 
antiparallel $ barrel (Figure 7-19E). The same fold occurs in the 
subunits of the protein coat of double-stranded DNA viruses such 
as adenoviruses and iridiviruses. In these latter instances, however, 
there are substantial differences in the quaternary structures of the 
viral protein coats. 
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angles are close enough, the significant stability of such 
an edifice, each of whose associations strengthens all of 
the others, could force the interfaces to rearrange suffi- 
ciently to accommodate the icosahedral arrangement. 
This cooperativity in the construction of the shell, 
which should resemble the cooperativity among the sup- 
ports of a building, also permits the interfaces to be 
weaker than interfaces that must stand alone. 
Nevertheless, that such a monomer, with such faces, has 
arisen rarely during evolution would not be a surprising 
fact. 

There is, however, a positive-strand RNA virus, 
MS2, that infects bacterial hosts. It also has an icosahe- 
dral protein coat composed of 60 pseudosymmetric 
homotrimers, but the folded polypeptides do not appear 
to be related to the folded polypeptides of the coat pro- 
teins of other positive-strand RNA viruses.'” 

All icosahedral viral protein coats have 60 identical 
protomers arrayed around the global 5-fold, 3-fold, and 
2-fold rotational axes of icosahedral symmetry of point 
group 532(J). It is the identity and the number of subunits 
within each protomer that differ among them. The small- 
est protein coats have only one subunit in each pro- 
tomer, those with three subunits in each protomer are 
somewhat larger, and those with more than three are 
larger still. The number of subunits within a protomer is 
designated with a capital T. For example, the expanded 
viral protein coats that have been discussed so far, in 
which there are three subunits in each protomer (Figures 
9-26 and 9-28), have T=3 icosahedral symmetry. The 
numbers of subunits found in the protomers of the larger 
protein coats are those numbers that permit the subunits 
to assume quasi-equivalent positions around local rota- 
tional axes of pseudosymmetry and that at the same time 
are compatible with the global rotational axes of icosa- 
hedral symmetry. For example, in viral protein coats with 
T=3 icosahedral symmetry, the fact that a global 5-fold 
rotational axis of symmetry and two global 3-fold rota- 
tional axes of symmetry are arranged equilaterally 
(Figure 9-21C) produces the local 3-fold rotational axis of 
pseudosymmetry around which three subunits can be 
arranged within a protomer while still being compatible 
with those global axes of symmetry. 

The viral protein coats with expanded T=3 icosa- 
hedral symmetry are the simplest cases of a common 
strategy” > used to expand the number of subunits in a 
protomer. Consider a hexagonal array of cyclic hexam- 
ers with their respective 6-fold rotational axes of symme- 
try normal to the plane of the array (Figure 9-30). Such a 
hexagonal array of hexamers automatically creates an 
array of global 6-, 3-, and 2-fold rotational axes of sym- 
metry, also all normal to the plane, that operate on the 
entire array. In certain combinations, these global axes of 
symmetry have the same spacing and almost the same 
fold relative to each other that the global axes of icosahe- 
dral symmetry have around one of its protomers. For 
example, in each of the four nested quadrilaterals in 


Figure 9-30A, the global 6-, 2-, 3-, and 2-fold rotational 
axes of hexagonal symmetry at its four vertices have the 
same relative spacing as the global 5-, 2-, 3-, and 2-fold 
rotational axes of symmetry around a quadrilateral pro- 
tomer in an icosahedral array (Figure 9-30, right panel). 
Consequently, the array of subunits within any one of 
these boundaries is able to be one of the 60 protomers in 
an icosahedral array if the subunit at the global 6-fold 
rotational axis at the top vertex is able to adapt quasi- 
equivalently to the global 5-fold rotational axis of icosa- 
hedral symmetry. Similar compatibilities between the 
global axes of symmetry at the vertices and on the edges 
of the nested equilateral triangles of Figure 9-30B and the 
quadrilaterals of Figure 9-30C-E also allow the arrays of 
subunits within their boundaries to be one of the pro- 
tomers in an icosahedral array. The nested sets in Figure 
9-30A produce protomers with 4, 9, 16, and 25 subunits 
(T=4, T=9, T= 16, and T= 25); those in Figure 9-30B pro- 
duce protomers with 3, 12, and 27 subunits (T=3, T= 12, 
and T= 27); and the skewed quadrilaterals in Figure 9-30 
panels C, D, and E produce protomers with 7, 13, and 19 
subunits (T=7, T=13, and T= 19), respectively. 

In each of these protomers, the three designated 
global rotational axes of symmetry of the hexagonal array 
at the vertices and the edges become global axes of icosa- 
hedral symmetry with the same fold. The global 6-fold 
rotational axis of symmetry of the hexagonal array at the 
undesignated vertex becomes a quasi-equivalent global 
5-fold rotational axis of symmetry when the protomer is 
in the icosahedral array. The other global rotational axes 
of symmetry of the hexagonal array that fall within and 
on the edges of the boundaries become local rotational 
axes of pseudosymmetry in the icosahedral array, or if 
the protomer is large enough some of them become local 
rotational axes of symmetry. It was the realization of the 
fact that such a hexagonal array is compatible with icosa- 
hedral symmetry that allowed Fuller to design the geo- 
desic domes,” which preceded the realization that viral 
protein coats are geodesic domes.?"® 

The protein coats of Sindbis virus?” and Nudaurelia 
om Capensis virus” each have 240 identical subunits 
arranged with T=4 icosahedral symmetry. In the pro- 
tomer of T=4 icosahedral symmetry (Figure 9-30A), 
there is a local 3-fold rotational axis of pseudosymmetry 
(gray symbol in Figure 9-30A) equidistant from the two 
2-fold rotational axes of symmetry and the upper vertex, 
in a position equivalent to the local 3-fold rotational axis 
of symmetry in the center of a protomer of T=3 icosahe- 
dral symmetry (gray symbol in Figure 9-30B). This local 
3-fold rotational axis of pseudosymmetry is retained in 
the T=4 icosahedral shell in each protomer and is the 
local rotational axis of pseudosymmetry for a trimer of 
quasi-equivalent subunits, just as is the equivalent local 
3-fold rotational axis of pseudosymmetry in a protein 
coat of T=3 icosahedral symmetry. Another copy of the 
same trimer necessarily occupies each position in the 
shell at which one of the 10 global 3-fold rotational axes 


Isometric Oligomeric Proteins 497 


oo oo oo oo oo Ge oo oo 650, 0% Ge 


E et e eet Sa et 5° ee e 


In? 99,2 
A 


Figure 9-30: Compatibility of a hexagonal array of hexamers with icosahedral symmetry. The positions of the global axes of symmetry 
normal to an infinite hexagonal array of homohexamers are indicated in the distributions surrounding the two hexamers in the upper right 
of the array. A solid hexagon indicates a global 6-fold rotational axis of symmetry. These rotational axes of symmetry are located in the respec- 
tive positions throughout the array. (A) Segments of the hexagonal array containing 4, 9, 16, and 25 subunits, respectively, that can be pro- 
tomers of icosahedral symmetry. The quadrilaterals enclosing these four segments and those in C, D, and E are the protomers of an 
alternative icosahedral hexacontahedron (drawn to the right of the hexagonal array). In each of the protomers of this hexacontahedron, a 
5-fold, a 2-fold, a 3-fold, and a 2-fold rotational axis of global symmetry are consecutively joined by line segments. (B) Segments of the hexag- 
onal array containing 27, 12, and 3 subunits, respectively, that can be protomers of icosahedral symmetry that are each enclosed by the 
boundaries of the equilateral triangle connecting a global 5-fold and two global 3-fold rotational axes of symmetry as in the icosahedral hexa- 
contahedron in Figure 9-21C. The gray global 3-fold rotational axes of symmetry of the hexagonal array in A and B become local 3-fold rota- 
tional axes of pseudosymmetry when the protomers for T=4 and T=3, respectively, are inserted into an icosahedral array (the 
hexacontahedron to the right of the hexagonal array and the hexacontahedron in Figure 9-21C, respectively). (C-E) Skewed segments of the 
hexagonal array containing 7 (C), 13 (D), and 19 (E) subunits that also can be protomers of icosahedral symmetry. The 2-fold and 3-fold rota- 
tional axes of symmetry noted at the boundary of each of the potential protomers coincide with global 2-fold and 3-fold rotational axes of 
symmetry of the symmetry, and the unmarked 6-fold rotational axis of symmetry becomes a quasi-equivalent 5-fold rotational axis of sym- 


metry, when the protomer is inserted into the icosahedral array. 


of symmetry emerges because that position is itself 
quasi-equivalent to that occupied by a trimer on one of 
the local 3-fold rotational axes of pseudosymmetry. 
Consequently, in a coat protein with T=4 icosahedral 
symmetry, there are 60 trimers located at the 60 local 
3-fold rotational axes of symmetry, one in each pro- 
tomer, and 20 of the same trimers located on the respec- 
tive ends of the 10 global 3-fold rotational axes of 
symmetry. 

Because the requirements placed upon the 80 
trimers in T= 4 icosahedral symmetry are similar to those 
placed upon the 60 trimers in T=3 icosahedral symme- 
try, there are proteins that can adopt either T=3 or T=4 
icosahedral symmetry depending on the conditions,” 
and within the family of alphaviruses there are protein 
coats with either 180 subunits (T=3) or 240 subunits 
(T= 4) 233 

The protein coat of bacteriophage P22*°°**’ has 
seven identical subunits (429 aa) arranged with T=7 
icosahedral symmetry (Figure 9-30C) in each of its 60 
protomers. The subunits adapt quasi-equivalently to 
form the pentamers sitting on the global 5-fold rotational 
axes of symmetry and the hexamers sitting on the local 
6-fold rotational axes of pseudosymmetry within each 
protomer. The mirror image of the arrangement of the 


rotational axes of symmetry in each of the various 
skewed quadrilaterals in Figure 9-30C-E does not super- 
pose on itself, so there are right-handed and left-handed 
versions of each of these quadrilaterals. The coat protein 
of bacteriophage P22 is a left-handed T=7 array. 

The family of papilloma viruses, simian virus 40, 
and polyoma viruses have protein coats that are a pecu- 
liar variation on T=7 icosahedral symmetry.”°°” Each 
of these viral protein coats is assembled from 72 identi- 
cal pentamers with cyclic symmetry the subunits of 
which are held together by extensive interfaces. Rather 
than using the same subunit to form pentamers and 
hexamers, both the global 5-fold rotational axes of icosa- 
hedral symmetry and the local 6-fold rotational axes 
within the protomer are occupied by copies of the pen- 
tamer. Consequently, dramatically different interfaces, 
on the same outer surfaces of identical folded polypep- 
tides in each pentamer, must be made around the global 
2-fold rotational axis of icosahedral symmetry and the 
local 2- and 3-fold rotational axes of hexagonal symme- 
try in the spaces between the homopentameric subunits. 
This is accomplished by forming the interfaces with flex- 
ible, structurally swapped strands of polypeptide rather 
than the usual rigid interfaces of interdigitated second- 
ary structure. 
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Bluetongue virus” and reovirus””' have protein 
coats with T=13 icosahedral symmetry (Figure 9-30D). 
In the former, the most extensive interfaces are those 
holding trimers of subunits together around the global 
and local 3-fold rotational axes, a fact suggesting that the 
fundamental unit of the structure is a trimer. The trimers, 
however, must adapt quasi-equivalently to being 
arranged about both the global 5-fold rotational axes of 
symmetry and the local 6-fold rotational axes of symme- 
try. 

Herpes simplex virus”**“* has a protein coat with 
T=16 icosahedral symmetry (Figure 9-30A). Pentamers 
called pentons occupy the global 5-fold rotational axes of 
symmetry, and hexamers called hexons occupy the local 
6-fold rotational axes of symmetry; but both pentons and 
hexons are formed from five copies and six copies, 
respectively, of the same folded polypeptide (1374 aa). 
There are 16 copies of this folded polypeptide in each 
protomer so each protein coat contains 1,319,040 aa. 

Adenovirus”” has a protein coat with T= 25 icosa- 
hedral symmetry (Figure 9-30A). In this arrangement 
each protomer contains four hexons that occupy the local 
6-fold rotational axes of pseudosymmetry, but a hexon is 
not a hexamer, it is a homotrimer. Each of the three iden- 
tical subunits of one of these homotrimers contains two 
internally duplicated domains, and six domains, two 
from each subunit, are arranged around each local 6-fold 
rotational axis to produce the pseudosymmetrically dis- 
played faces for the interfaces in which each hexon must 
participate with its six neighbors.” Unlike those in 
herpes simplex virus, the pentons in adenovirus, cen- 
tered on the global 5-fold rotational axes of symmetry, are 
formed by different folded polypeptides (571 aa) from 
those (967 aa) forming the hexons. 

In all of these viral protein coats based on hexago- 
nal expansion of the basic icosahedral protomer, the 
problem of closing the protomer arises. A hexagonal 
array is a potentially infinite array, yet the protomers 
within the boundaries of the various equilateral triangles 
and quadrilaterals of Figure 9-30 are finite portions of 
that infinite array. During the assembly of the protein 
coat, amechanism for measuring out the size of that por- 
tion is required, and this role seems to be filled by pro- 
teins accessory to the subunits of the coat.” 

The viral protein coats in which the protomer con- 
tains only a few subunits have the appearance of spheres, 
even though their surfaces are often quite irregular 
(Figure 9-28). As the number of subunits in the hexago- 
nal array of hexamers becomes greater, there is a ten- 
dency for the structure to look polyhedral****47* 
because of the tendency of the protomers to adopt the 
plane of the hexagonal array. In some viral protein coats, 
the same quasi-equivalent interface will hold its two sub- 
units in a plane at one location, fitting into a polyhedral 
face, and at an angle to each other in another, forming a 
crease along a polyhedral edge.” Usually, however, the 
hexagonally packed protomers are bowed out so that 


their subunits end up evenly distributed over the surface 
of an almost spherical oligomeric protein, just as the 
modular units of a geodesic dome end up producing an 
almost spherical exploded icosahedron and for the same 
reason. The joints adjust to distribute uniformly the 
strain produced by creasing”? the hexagonal array con- 
sequent to requiring certain of its 6-fold rotational axes 
of symmetry to become 5-fold rotational axes of symme- 
try. 

The protein clathrin forms isometric cagelike struc- 
tures that assemble around small pinocytotic vesicles as 
they bud inward from the plasma membrane of an 
animal cell. The polypeptide is 1600 aa in length, and 
when folded it produces a tubular protein, 45 nm in 
length and 2.5 nm in diameter.“ The protomer from 
which the cages of clathrin are formed is a trimer of 
these polypeptides, all three joined together at one end 
around a 3-fold rotational axis of symmetry to produce a 
triskelion with bent ams 27 Different numbers of these 
triskelia can assemble to produce intact cages of various 
shapes between 70 and 200 nm in diameter.””’ The wires 
of the mesh forming these cages are presumed to be 
formed from two or more intertwined arms of the triske- 
lia.” Each and every vertex in each and every cage is a 
junction of three wires, and each vertex must contain the 
local 3-fold rotational axis of symmetry at the nexus of an 
individual triskelion. The mesh itself is always formed of 
pentagons and hexagons of wire producing polyhedra 
with as many as 32 faces, but most of them are not based 
on isometric symmetries. It is the elongated and flexi- 
ble nature of the subunit that permits the one protein to 
generate such a wide variety of oligomeric proteins. 
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Problem 9-9: In Figure 9-22 there are four 3-fold rota- 
tional axes of symmetry, each of which superposes four 
different triplets of subunits. For example, one of these 
axes superposes A, B, and C; J, E, and H; L, D, and G; and 
K, F, and I. 


(A) List the four triplets for each of the other three 
3-fold rotational axes of symmetry. 


In Figure 9-22 there are three 2-fold rotational axes 
of symmetry, each of which superposes six different 
twins of subunits; for example, one of them superposes A 
and K, B and L, C and J, D and H, E and I, and F and G. 


(B) List the six twins for each of the other two 2-fold 
rotational axes of symmetry. 


Problem 9-10: Make a xerographic copy of Figure 9-26. 
On that xerographic copy designate every global 2-, 3-, 
5-, and 6-fold rotational axis of symmetry. Use the 
standard symbols for this designation. In the same figure 
designate some of the local 2-, 3-, and 6-fold rotational 
axes of pseudosymmetry by the symbols P,, P3, and Pg, 
respectively. 


Problem 9-11: Kepler derived the rhombic dodecahe- 
dron from the intersection of a cube and an octahedron. 
Draw the intersection of a cube and an octahedron, and 
connect its vertices to produce a rhombic dodecahedron. 


Helical Polymeric Proteins 


Helical fibers formed from identical subunits of protein 
have useful properties, and there are many examples of 
them. For example, to accomplish its function of 
hybridizing homologous single strands of DNA, the RecA 
protein binds to DNA in a long helical polymeric sheath 
that matches the helical symmetry of the DNA. The heli- 
city of the sheath, however, is built into the protein 
because it spontaneously forms a sheath with almost the 
same helical symmetry even in the absence of the 
DNA.” 

Every time an interface is created by evolution 
between two complementary faces on the surface of 
copies of the same globular protein, no matter how those 
two faces are disposed on its surface, a distinct screw axis 
of symmetry is defined. Most of these screw axes would 
be open and generate helical polymers of the monomer, 
so the existing helical polymers must be the few that have 
escaped elimination by natural selection. The surprising 
fact is that there are so few helical polymeric proteins. 

A single geometric helix can be defined by three 
parameters: its radius, its hand, and its pitch. The pitch 
of a helix is the distance it rises for each complete turn. It 
can also be defined by four parameters: its radius, its 
hand, a recurring radial angle dividing the helix into 
equal segments of arc, and the rise for each of these 
equal segments of arc. In a helical polymeric protein the 
second definition makes more sense because the succes- 
sive subunits can be considered to be the repeating seg- 
ments of arc. 

An interface between two identical molecules of a 
protein can generate several types of helical polymers. 
The actin helix (Figure 9-1B,C) is an example of the sim- 
plest type in which the protomers ascend one step at a 
time around the screw axis. In the actin polymer, the 
screw axis of symmetry passes through a corner of each 
protomer, the helix is not much wider than the protomer 
(Figure 9-1C), and the radial angle between successive 
protomers is fairly large (-166°). It has already been 
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noted that the actin helix can be represented as a single 
helix or as a double helix. 

If the generating interface creates a screw axis of 
symmetry where the radial angle between successive 
subunits is fairly small, so that there are a number of sub- 
units in one turn of the helix, and where the rise for each 
subunit is just enough to cause the subunits in each turn 
of the helix to lie upon the subunits from the turn below 
it, then a singly threaded cylinder is formed. Because of 
steric problems, such a generating interface usually cre- 
ates a screw axis of symmetry that is separated from the 
subunits themselves, rather than passing through each of 
them, and the threaded cylinder is hollow inside. An 
example of such a hollow, singly threaded helical cylin- 
der is tobacco mosaic virus (Figure 9-31) 253-258 Th tobacco 
mosaic virus, the angle between successive subunits is 
22.03°, and the rise for each subunit is 0.14 nm.2>?°6 
These dimensions bring the subunits in the next turn of 
the helix into contact with the subunits in the preceding 
turn, and the pitch of the helix is 2.3 nm, the height of a 
subunit. Between two successive turns there are inter- 
faces among the protomers. Because of the screw sym- 
metry, each protomer provides at its lower surface the 
upper faces for the interfaces with the two protomers 
below it and at its upper surface the lower faces for the 
interfaces with the two protomers above it. Because every 
protomer is the same, each of these respective interfaces 
is the same and repeats along the thread every 22.03°. 

The generating helix emphasized in the drawing of 
tobacco mosaic virus (Figure 9-31B) is a right-handed 
helix of pitch 2.3 nm. If the structure is examined closely, 
however, it can be seen that there are sets of helices of 
steeper pitch running through it. The subunits in tobacco 
mosaic virus are arranged upon a helical surface lat- 
Hee" that contains all of these other sets of helices. The 
helical surface lattice can be displayed in two dimensions 
by cutting the cylinder along a line parallel to the central 
axis and flattening it upon the page (Figure 9-31C). When 
the helical surface lattice of tobacco mosaic virus is 
viewed in this format, it can be seen that, in addition to 
the single right-handed helix of pitch 2.3 nm running 
through the lattice, there are also sets of 17 parallel right- 
handed helices and sets of 16 parallel left-handed helices 
(lower and upper sets of arrows, respectively, in Figure 
9-31C).”°® Any one of these sets of helices can also define 
the structure. A helical surface lattice can be uniquely 
designated by the number of parallel strands of left- 
handed twist (a negative number) and the number of 
parallel strands of right-handed twist (a positive number) 
for any one of the sets of each respective hand.” For 
example, the helical surface lattice of tobacco mosaic 
virus can be designated (-16, 1) or (-16, 17). The surface 
lattice of the helical polymer of flagellin from Salmonella 
typhimurium also contains a singly threaded right- 
handed helix as well as a set of five parallel left-handed 
helices, a set of six parallel right-handed helices, and a set 
of 11 parallel left-handed helices.” 


500 Symmetry 


-16 


17 


Figure 9-31: Helical surface lattice of tobacco mosaic virus. (A) Molecular model derived from a map of electron density calculated from the 
helical diffraction pattern of X-radiation (Bragg spacing > 0.29 nm) emerging from an oriented preparation of the viruses.” The diffraction 
from a specimen prepared in 1960 was gathered to Bragg spacing of 0.29 nm in 1982. The timing of this sequence of events illustrates how 
rare it is to obtain a well-aligned preparation of a helical polymeric protein. Reprinted with permission from ref 256. Copyright 1989 Elsevier 
B.V. (B) Diagrammatic representation of the helical surface lattice of the protein coat of tobacco mosaic virus.” Individual protomers form 
a singly threaded helical screw that is a hollow cylinder. Each protomer has the same orientation and the successive protomers are related 
by a rise of 0.14 nm and a rotation of 22° along the helix. Reprinted with permission from ref 254. Copyright 1972 Federation of American 
Societies for Experimental Biology. (C) The surface helix of tobacco mosaic virus flattened onto the page.” The hollow cylinder in panel B 
was split along a vertical plane passing through its wall and then flattened onto the page. The horizontal line represents a circle around the 
cylinder, the plane of which is normal to the central axis of the cylinder, that has been also split and flattened. It has been added to assist in 
counting the numbers of parallel helices in the various sets. The arrows at the upper end of the lattice indicate the set of 16 left-handed 
helices, and those at the lower end, the set of 17 right-handed helices that run through the lattice. These are referred to as 16 start and 17 start 


arrays, respectively. Reprinted with permission from ref 255. Copyright 1974 Academic Press. 


A helical surface lattice does not have to have a 
single shallow helix of one or the other hand acting as the 
generating operation. The extended tail of the T4 bacte- 
riophage is built from hexamers with cyclic symmetry. 
The hexamers are stacked upon each other by successive 
interfaces that cause them to be out of alignment in a 
right-handed sense by 17°.” This creates a helical sur- 
face lattice that is a hextuply threaded cylinder. The 
(-6, 6) lattice contains both a set of six parallel helices of 
shallow pitch with left-handed sense and a set of six par- 
allel helices, of steeper pitch, with a right-handed sense. 

The CA protein from type 1 immunodeficiency 
virus spontaneously forms several different sizes of heli- 
cal tubes. It forms two different tubes with distinct 
(-12, 11) and (-11, 10) surface lattices, respectively, that 
each have a single generating helix of one strand with a 
left-handed pitch. The same protein, however, also 
forms tubes with (-10, 13), and (-8, 5) surface lattices.’° 

The radial angle relating successive subunits to 
each other in a helix or each of the identical helices in a 
set of helices in a helical polymer is determined by the 
interfaces among them. It is these interfaces that pro- 
duce a helical filament such as actin or a singly or multi- 
ply threaded cylinder such as tobacco mosaic virus, 


flagellin, or the extended tail of the T4 bacteriophage. 
This radial angle between successive subunits that these 
interfaces dictate can have any numerical value. 
Technically, this means that the helix or helices never 
position a monomer exactly above any monomer below 
it in the helix. For example, tobacco mosaic virus has 
49.02 subunits in three turns (22.03° subunit”), not A9. 77 
Therefore, there is no translationally repeating unit in a 
rigid, biological helical structure. What this means is that 
a helical polymeric protein has difficulty crystallizing in 
three dimensions, and crystals of helical polymers suit- 
able for high-resolution crystallographic studies have not 
been produced. One interesting example, which is 
almost an exception to this uniform failure, is the 
protofilament running through a crystal of the protein 
encoded by the mreB gene in E. coli. This protein is a 
homologue of actin and forms filaments similar to those 
formed by actin (Figure 9-1B) but without the helical 
twist, so a filament can be incorporated into a crystal.° 
One approach to determining the structure of a hel- 
ical polymeric protein is to crystallize the monomer, con- 
struct its crystallographic molecular model at atomic 
resolution, and simultaneously determine the structure 
of the helical polymer at high enough resolution to posi- 


tion crystallographic models of these monomers in the 
proper orientation at the locations they occupy in the 
polymer. Such a strategy has been applied successfully to 
the polymer of actin (Figure 9-1B).** The details of the 
structure of an intact polymer at the resolution required 
by this strategy can be determined by image reconstruc- 
tion?® of electron micrographs. 

Under appropriate circumstances, an electron 
microscope is capable of magnifying the image of a pro- 
tein sufficiently that individual subunits can be distin- 
guished and their shapes almost distinguished. A beam 
of collimated electrons is impinged upon the sample, and 
those that pass through it without being deflected suffi- 
ciently to leave the beam are then focused by electro- 
magnetic lenses. The contrast in the image is caused by 
the distribution within the sample of the ability to deflect 
or scatter electrons from their path. The sample placed in 
an electron microscope usually has to reside in a cham- 
ber under high vacuum. This means that the sample has 
to be a solid with a low vapor pressure. In the case of mol- 
ecules of protein, this requires that they be encased in a 
solid matrix, which also must be a glass to avoid the prob- 
lems of the diffraction caused by crystalline solids. 

To enhance contrast and provide a solid support 
simultaneously, the glass is often formed by drying a 
solution of a salt containing a heavy metal such as ura- 
nium or tungsten. Either uranyl acetate or sodium phos- 
photungstate, for example, will form an electron-dense 
glass when a film of a solution containing it is dried. This 
electron-dense glass will surround, encase, and support 
a molecule of protein that had been present in the solu- 
tion from which the film was made. The glass of the salt 
of heavy metal encasing the molecule of protein forms a 
three-dimensional boundary or mold that has the shape 
of the molecule of protein. It is the dark image of this 
mold of electron-dense glass encasing the light image of 
the electron-translucent molecule of protein that is 
observed in the micrograph. This procedure is known as 
negative staining. 

It is also possible to insert thin layers of amorphous 
ice, which is a glass, into a cryoelectron microscope, an 
electron microscope not with cold electrons but with a 
stage for the sample that is cooled to a very low temper- 
ature.?%%? When a molecule of protein is embedded in 
such a glass (Figure 9-32A),*°° the contrast observed 
results from the fact that the atoms in a molecule of pro- 
tein are more efficient at deflecting electrons than the 
molecules of water in amorphous ice. A positive image of 
the molecule, rather than a negative image, is observed. 

The three-dimensional distribution of the ability to 
deflect electrons within the sample is known as the dis- 
tribution of scattering density, 0(x,y,z). If a map of scat- 


* A more recent crystallographic molecular model of actin” has a 
somewhat different structure than the one used to construct Figure 
9-1B, but the differences are not significant enough to affect the 
choice of the orientation of the monomer within the polymer. 
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tering density can be produced at as high a resolution as 
possible, details of the structure of either the mold in 
which the molecule of protein is encased or the molecule 
of protein itself can be observed. Image reconstruction 
is any computational method that is used to calculate 
(x,y,z) from the images of molecules of proteins 
observed in electron micrographs.”® In all cases, the 
electron micrograph is the experimental data submitted 
to these calculations, and the electron micrograph used 
in a particular reconstruction must be presented to the 
reader so that she may appreciate the point of departure 
(Figure 9-32A). 

A molecule of protein has a certain distribution of 
electron density p(x,y,z), and when molecules of protein 
are arrayed in a crystal they create a periodic, three- 
dimensional distribution of electron density. This peri- 
odic array diffracts X-rays to produce a diffraction 
pattern that is also periodic. The angular dispositions of 
the reflections in the diffraction pattern are determined 
by both the angles among the axes of the fundamental 
unit cell and its dimensions. The dimensions and axial 
angles of the unit cell can be calculated from these angu- 
lar dispositions. The diffraction pattern of the crystal is 
the Fourier transform of the periodic distribution of elec- 
tron density it contains. The magnitudes and phases of 
the maxima in the diffraction pattern can be calculated 
from the distribution of electron density within the unit 
cell by digital Fourier transformation. Conversely, the 
distribution of electron density in the unit cell can be cal- 
culated from the amplitudes and phases of the diffrac- 
tion maxima by digital Fourier transformation. 

A helical polymer of protein in its mold of negative 
stain or amorphous ice has a certain distribution of scat- 
tering density, 0(x,y,2), which is a periodic function 
because the helix is periodic. Each protomer of the poly- 
mer is a unit cell in this helical array. The computed 
Fourier transform of this periodic array is also a periodic 
function. From the spatial disposition of its maxima, the 
angle between successive unit cells in the helix, the rise 
for each unit cell, and the number of helical threads in 
the structure can be calculated. From the amplitudes and 
phases of the maxima of the Fourier transform, the dis- 
tribution of scattering density within the unit cell can be 
calculated. 

The depth of focus in an electron microscope is 
larger than the width of a specimen containing a helical 
polymer, and all points in the specimen are in focus in 
the final micrograph. As such, the micrograph represents 
the projection of the three-dimensional distribution of 
scattering density onto a two-dimensional surface.’ 
The Fourier transform, F(X,Y,Z), of any three-dimen- 
sional distribution of scattering density je? 


PIXYZ)= [ff ole y. zJexp[2zi(ax+yy+ zZ)] dxdydz 
object 


(9-4) 
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When Z=0 


F(X, Y,0) = [fol y)explani(ax + yY)] dxdy 
(9-5) 


where 
o(x,y) = [ots die (9-6) 


The function o(x,y) is the projection of the three-dimen- 
sional distribution of scattering density. This set of rela- 
tionships states that the two-dimensional Fourier 
transform of the projection of the distribution of scatter- 
ing density is the central section of the three-dimen- 
sional Fourier transform of the unprojected distribution 
of scattering density. This central section of the three- 
dimensional Fourier transform is obtained by digitizing 
the optical density of the micrograph and calculating a 
digitized Fourier transform of the image by computer. 
The significant advantage of performing the Fourier 
transform computationally rather than by diffraction is 
that the computed Fourier transform comes with both 
amplitudes and phases instead of just amplitudes. The 
disadvantage is that there are far fewer unit cells con- 
tributing to the Fourier transform. 

In the case of a helical polymeric protein embedded 
in negative stain or amorphous ice, the central section of 
its three-dimensional Fourier transform systematically 
intersects all of the maxima in its three-dimensional 
Fourier transform. In this two-dimensional central sec- 
tion of its Fourier transform (Figure 9-32B), the ampli- 
tudes and the associated phases of the transform (Figure 
9-32C) are arrayed along layer lines (the horizontal lines 
in the figure) that arise from the repeating patterns in the 
helix.”””® If the correct helical lattice has been assigned 
to the structure so that the layer lines can be properly 
indexed,”” the indexed amplitudes and phases distrib- 
uted along the layer lines (Figure 9-32C) can be used to 
calculate 6(x,y,z) by a Fourier-Bessel transform,” just 
as the properly indexed amplitudes and phases of the 
pattern of the X-rays diffracted from a crystal can be used 
to calculate p(x,y,z) by a Fourier transform. Figure 9-32D 
presents an example of such a reconstruction from the 
electron micrograph of Figure 9-32A. 

Image reconstruction succeeds in producing the 
molecular structure of a helical polymeric protein when 
it is able to produce an image of the helical polymer of 
sufficient resolution to position and orient unambigu- 
ously a crystallographic molecular model of the 
monomer within the polymer. In addition to the success 
this approach has achieved with the helical polymer of 
actin (Figure 9-1B),’ it has been possible to insert a crys- 
tallographic molecular model of the of heterodimer of 
tubulin” into image reconstructions of the micro- 


tubule””'*” and to insert a crystallographic molecular 
model of flagellin into an image reconstruction of a bac- 
terial flagellar filament.?” In the latter case, as with the 
protein encoded by the mreB gene of E. coli, the flagellin 
crystallized within a protofilament of the overall flagellar 
filament. Consequently, its position and orientation 
within the image reconstruction of the complete fila- 
ment could be assigned with greater certainty. 

Usually the Fourier transform of a digitized elec- 
tron-microscopic image of a helical polymeric protein 
embedded in amorphous ice has measurable amplitudes 
that arise from helical spacings of 2nm or 
greater, 674275 and as in crystallography, this deter- 
mines the lower limit of the resolution. Images of 
tobacco mosaic virus, however, have produced Fourier 
transforms with terms arising from spacings of 1 nm;?”° 
and images of helical tubes of acetylcholine receptor, 
terms arising from spacings as little as 0.5 nm 7" 

Difference maps of scattering density can also be 
useful. A helical polymeric protein, for example, helical 
fibers of tubulin, often associates with another protein, 
for example, kinesin, at sites on its surface, one of which 
is located on each of the asymmetric units in the helical 
lattice, for example, on each subunit of tubulin.” When 
the helical polymer is decorated with the other protein, 
for example, when a helical fiber of tubulin is decorated 
with kinesin, the distribution of that other protein will 
assume the helical distribution of the underlying fila- 
ment. The Fourier transform of an image of the undeco- 
rated filament is subtracted from that of a decorated 
filament, and the Fourier-Bessel transform of this differ- 
ence is a map of scattering density arising just from the 
bound protein.” 

A population of oriented helical polymeric pro- 
teins produces X-ray diffraction. As a diffraction pattern 
is the Fourier transform of the distribution of electron 
density in the helical specimen, these X-ray diffraction 
patterns also display the layer lines seen in Fourier trans- 
forms of digitized images from electron micrographs of 
those same specimens. Because the reflections are pro- 
duced by diffraction rather than computationally, they 
must be phased by multiple isomorphous replace- 
ment.” This has been accomplished for oriented fila- 
ments of the protein coat of tobacco mosaic virus 
incorporating reflections arising from spacings to 0.3 nm 
to obtain a map of electron density into which a model of 
the polypeptide could be inserted (Figure 9-31A).?°°? It 
is also possible to use phases from electron micrographs 
and amplitudes from X-ray diffraction to produce a map 
of electron density.” 

There are helical polymeric proteins that are con- 
structed from long, flexible strands of polypeptide rather 
than globular protomers. These proteins resemble the 
helical cables that are used in the construction of sus- 
pension bridges. The smallest structural element in a 
cable is astrand. Two or more strands are twisted around 
each other to make a rope. Two or more ropes are then 
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Figure 9-32: Image reconstruction of the helical array of subunits in a filament of actin.”” Filaments of human cytoskeletal ß actin were sus- 
pended in a buffered aqueous solution. A small sample of the suspension (4 uL) was spread over a grid for electron microscopy and rapidly 
frozen to obtain a thin sheet of amorphous ice in which the filaments were embedded. (A) An electron micrograph (36000x) of an actin fila- 
ment embedded in the ice. The arrows mark points of helical crossovers. The electron micrograph was scanned and a Fourier transform of 
the digitized image was performed. (B) The Fourier transform (Equation 9-5) of the image in panel A is presented graphically on a two-dimen- 
sional field such that the brightness of the image is directly proportional to the amplitude of the function at that point. Because the array is 
helical, the maxima in the Fourier transform are found along layer lines. (C) Distribution of the amplitude and phase of the Fourier trans- 
form along several of the layer lines in panel A. Phases (œn; degrees) are presented in the upper plot of each pair and amplitude (F,, relative 
units) in the lower plot. The indexing of the layer lines (n, D designates the Bessel order (n) and the number of the layer line (J). 
(D) Reconstructed image in stereo resulting from a Fourier-Bessel transform of the amplitudes and phases along the properly indexed layer 


lines. Reprinted with permission from ref 206. Copyright 2000 Elsevier B.V. 


twisted around each other to make a cable. The strands 
from which ropes of protein are made are polypeptides 
read from messenger RNA, and for this reason, each 
strand must have a discrete length. This in turn means 
that the cable is built from strands of uniform length that 
are overlapped to provide the necessary tensile strength. 

The arrangement of the strands in the molecular 
rope and the ropes in the cable is elucidated by permit- 
ting a macroscopic fiber to diffract X-radiation. A macro- 
scopic fiber, built from billions of the molecular cables, is 
placed in a beam of X-rays. The cables are all more or less 
aligned with the axis of the macroscopic fiber. The heli- 
cal arrays of the strands in the ropes and the helical 
arrays of the segments of rope in the cable have certain 
regularly recurring dimensions and angles associated 
with them that give rise to diffraction. The dimensions 
and helical parameters of these arrays can be established 
from the angles at which the reflections emerge from the 


fiber. For example, a serial set of reflections is produced 
by a tendon from the tail of a rat when the tendon is 
placed in a beam of X-rays. This serial set of reflections 
arises from a helical array that repeats every 67 nm,”°>”®! 
and this dimension, along with others, must be incorpo- 
rated into the model for the complete cable of the colla- 
gen from which the tendon is formed. 

Collagen is the helical polymeric protein from 
which is formed the tough, flexible material composing 
tendons, intercellular matrix, the matrix of bone, and 
many strong, plastic sheets of various shapes and sizes 
found in animals. The basic structural element in these 
macroscopic structures is the fibril of collagen, which is 
a cylindrical thread of indefinite length, 200-800 nm 
wide. This thread in turn is formed from molecular 
cables of collagen, each probably as long as the fibril, 
packed side by side in register. 

The strand from which a molecular cable of colla- 
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gen is formed is a polypeptide that contains significant 
amounts of 4-hydroxyproline and that has a character- 
istic repeating sequence in which glycine is every third 
amino acid. In humans, there are 30 different polypep- 
tides containing segments with such a repeat. These 
polypeptides vary in length from 666 to 3039 aa; most of 
them have lengths of 1000-2000 aa. The segment with 
the repeating sequence can be as long as 1530 aa or as 
short as 15 aa. Polypeptides with segments longer than 
500 aa in which every third amino acid is a glycine usu- 
ally have only one continuous segment of this repeating 
pattern; polypeptides with segments shorter than 300 aa 
usually have multiple segments of different lengths. The 
segment or segments with the repeating sequence are 
usually found in the middle of the polypeptide. It is such 
a segment that forms the strand of the rope; the remain- 
der of the polypeptide forms appendages to the rope 
usually at its two ends. If the rope forms a cable, these 
amino-terminal and carboxy-terminal appendages jut 
out from the cable at regular intervals. 

Three strands of polypeptide with a repeating 
sequence in which every third amino acid is a glycine can 
wrap around each other in parallel to form a triple-heli- 
cal rope (Figure 9-33).”” The collagen triple helix is a 
coiled coil of three helices, each formed by one of the 
strands. The structure is that proposed by Rich and 
Crick.” The three helices are all left-handed with the 
same helical parameters. Each has on average a rise of 
0.29 nm aa™ over an angle of -107° aa’, and this confor- 
mation produces a complete turn every 3.36 aa "CD 
Suppose each of the three helices had an angle of 
-120° aa’ instead of-107° aa‘ so that every third amino 
acid in each of them would point in the same direction. 
They could then be brought together in a triangular 
bundle in which every glycine in each helix was directed 
into the center. As with a coiled coil of «œ helices, the deficit 
between -107° and-120° is accommodated by coiling the 
three individual left-handed helices around each other, 
but with a right-handed twist of +13° rather than the left- 
handed twist of-4°. As a result, the core of the triple helix 
contains only glycines along its entire length. 

The three strands fit together so that they are in reg- 
ister. Only one of the three side chains at each level in the 
rope is a glycine, and the pro-S hydrogen of this glycine 
is pointed towards the center. The side chains that nec- 
essarily occupy this pro-S position in the other two 
amino acids at that level from the other two strands point 
away from the center of the rope. The a carbon of each 
glycine is snug against the acyl carbon-oxygen of the 
glycine in the level above. Within each level there are 
three amino acids arrayed cylindrically about the central 
axis: the glycine, an amino acid to the carboxy-terminal 
side of the glycine in the level above, and an amino acid 
to the amino-terminal side of the glycine in the level 
below. Traditionally, these three amino acids are desig- 
nated glycine, amino acid X, and amino acid Y, respec- 
tively. The small size of the pro-S hydrogen of each 


Figure 9-33: Triple helix of collagen. A portion of the crystallo- 
graphic molecular model” of the synthetic triacontapeptide with 
the sequence (Pro-Hyp-Gly);ITGARGLAG(Pro-Hyp-Gly), where 
Hyp is 4-hydroxyproline, is presented. The carboxy-terminal seg- 
ment -(Pro-Hyp-Gly)>- of the portion of the model that has been 
drawn represents the carboxy-terminal region of nucleation for a 
triple helix; the amino-terminal segment, -ITGARGLAG-, repre- 
sents the bulk of the interior of the triple helix. Molecules of water 
are represented by white spheres attached to the structure only by 
hydrogen bonds. This drawing was produced with MolScript.*® 


glycine permits the three strands to approach each other 
so closely that an interstrand hydrogen bond forms at 
each level in the triple helix between the amido oxygen of 
amino acid X and the amido nitrogen-hydrogen of the 
glycine in the level below. In all of the regions of the triple 
helix in which the amino acids X are not prolines or 
4-hydroxyprolines, there is a molecule of water hydro- 
gen-bonded to the amido oxygen of the glycine at one 
level and the amido nitrogen-hydrogen of amino acid X 
in the level two steps below. These molecules of water 
are integral components of the structure.” If amino acid 
Y in the same level as the amino acid X providing a donor 
to the molecule of water is a threonine, a serine, a 
4-hydroxyproline, or an asparagine, the donor in its side 
chain can form a third hydrogen bond to one of these 
integral molecules of water. 

The carboxy-terminal regions of each of the three 
polypeptides in such a triple helix are often composed of 
repeating sequences of prolyl-4-hydroxyprolylglycine or 
prolylprolylglycine.“” These segments are thought to 


nucleate the structure.” Beyond this region of nucle- 
ation in the triple helix, the only requirement is for a 
glycine at every third position. In the short region of 
nucleation, the helix governing the conformation of a 
strand rises 0.28 nm aa™ over an angle of -104° aa’ and 
the structure has angles @ and y of around -70° and 
+160°, which are in the range of those for the polyproline 
helix. It is thought that these regions spontaneously form 
a triple polyproline helix, which then propagates into the 
rest of the structure "77" Synthetic peptides of the 
sequence (Pro-Pro-Gly),, or (Pro-Hyp-Gly), sponta- 
neously form such triple helices.” 

Many of the polypeptides containing sequences in 
which glycine is every third amino acid form triple 
helices that do not associate further to form fibrils. In 
some of these polypeptides the triple-helical regions are 
frequently interrupted by segments incompatible with a 
triple-helical structure.”” In others, the two or three 
triple-helical segments are of insufficient length to effect 
fibrillar formation.””' In yet others, the globular domains 
flanking the triple-helical segments are too large to 
permit further association.” The polypeptides of colla- 
gen that associate further beyond the triple-helical state 
usually contain continuous segments of a thousand or 
more amino acids with uninterrupted sequences in 
which glycine is at every third position flanked by car- 
boxy-terminal and amino-terminal portions of several 
hundred amino acids or less.” 

The paradigm of such a fibrillar collagen is type I. 
The triple helix of type I collagen is a heterotrimer of two 
al polypeptides (1057aa in the human) and one 
a2 polypeptide (1024 aa in the human) and is formed 
from a sequence (Gly-X-Y)333 of 1014 aa from each chain. 
The three chains in the triple helix are in register so the 
rope produced by these three strands is a finite segment, 
and each segment of rope has the same length. The car- 
boxy-terminal and amino-terminal flanking portions, 
the telopeptides, are short (<30 aa), so there is little to 
interfere with formation of the fibril. This type of collagen 
is the major component of tendon, and it is fiber diffrac- 
tion from tendons that has provided the angle of 107° 
and rise of 0.29nm characterizing the collagen triple 
helix”** as well as evidence for repeats of 67 and 30 nm.”*! 
These latter repeats arise from the cable formed from the 
segments of triple-helical rope. 

The segments of rope (represented by vertical lines 
in Figure 9-34A)”™ forming this cable are 298 nm in 
length (1014 x 0.29 nm) and are placed side by side in a 
right-handed helical array. In this helical array, the seg- 
ments of rope are not butted up against each other, and 
gaps of 37 nm in which the cable has only four ropes 
alternate with segments of 30 nm in which it has five 
ropes. The radial angle between each successive pro- 
tomer, each of which is an individual segment of rope, 
would be 72° and the helix would repeat every five pro- 
tomers if the ropes were held perfectly vertical (Figure 
9-34A), but they are not. 
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Figure 9-34: Arrangement of the triple-helical ropes of collagen in 
the cable-forming tendon.” The individual segments of rope, each 
a triple helix formed from three segments of 1014 aa from the cen- 
tral regions of three collagen polypeptides, are represented by ver- 
tical lines with splayed amino- and carboxy-terminal ends. The 
individual segments of rope are arrayed in a right-handed helical 
distribution (left panel). In tendon,” the cable is twisted in a 
left-handed sense by one turn over its repeat of 335 nm (right 
panel). Because the most rigid segments of the cable are where five 
strands overlap, the twist occurs in the regions where only four 
strands overlap. The cable is twisted by -72° in each of these gap 
regions to give each of the five strands a left-handed helical path in 
these spaces. The cable is kinked as well as twisted. Two of the unit 
cells for the crystalline array in tendon are drawn. The cable shifts 
from one column of unit cells into the column of unit cells diago- 
nally adjacent to it at each level. 
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The collagen in a tendon is arranged in a crystalline 
array the unit cell of which is triclinic (Figure 4-2)” with 
a length of 67 nm, equal to only one-fifth of the 335 nm 
repeat of the cable. Consequently, the cable must be 
twisted so that its structure repeats translationally every 
67 nm. This is accomplished by twisting it by -72° over 
each 67 nm segment of its length to produce repeating 
conformations that are translationally superposable on 
each other (Figure 9-34B).”” Because the regions with 
only four ropes are the most flexible, the twist is confined 
to them. The cable is kinked so that as it ascends at each 
step it shifts from one column of unit cells to the column 
diagonally adjacent to it. The pentagonal array of the five 
ropes is compressed in one dimension into a layer of two 
and a layer of three ropes to permit the pentamer of 
ropes to pack in a hexagonal array. The helix of each 
strand is left-handed, the superhelix of the triple helix of 
the three strands forming the rope is right-handed, and 
the twist of the cable of five ropes is left-handed. 
Alternating the hand of the elements in a cable increases 
its strength.” 

As the cables enter these rather contorted arrays, 
they are aligned by the crystallization in register, and the 
crystallographic asymmetric units, each containing the 
equivalent of 67 nm of cable, create a repeating pattern 
on the surface of the fibril itself. This pattern can be 
seen in the electron microscope. It appears as alternat- 
ing thickenings and thinnings along a desiccated fibril 
that repeat every 67 nm. These thickenings and thin- 
nings are believed to represent the alternation in regis- 
ter of regions within the molecular cables five ropes in 
thickness and those of four ropes in thickness, respec- 
tively (Figure 9-34). Fibrils of collagen also stain posi- 
tively by chelating heavy metals at specific locations on 
the surfaces of the ropes where there are high, unbal- 
anced constellations of negative charge. Because the 
segments of rope are placed in register by the side to 
side crystallization of the cables, these positions to 
which heavy metals bind form bands across the fibrils or 
across sheets of collagen. The pattern of bands is quite 
reproducible,” and it repeats every 67 nm "77 From an 
examination of the patterns in which charged amino 
acids occur in the amino acid sequences of the polypep- 
tides, it can be shown that the pattern in which these 
bands occur on the ropes is entirely consistent” with 
the triple-helical array of strand and pentahelical array 
of ropes in the hypothetical model (Figure 9-34B), the 
Fourier transform of which mimics the reflections in the 
X-ray diffraction pattern from an oriented tendon.” All 
of these correlations provide independent support for 
this model of the structure of the cable. 

Three classes of filaments can be observed within 
animal cells. Thin filaments, or microfilaments, are fila- 
ments of actin the basic structural element of which is 
the actin helix (Figure 9-1C). Tropomyosin, a fibrous 
protein that is one continuous coiled coil of two parallel 
ahelices,”” lies in the grooves of the actin helix. Each of 


these coiled coils of tropomyosin spans seven actins. The 
globular protein troponin decorates the microfilament at 
regular intervals. The structure of the thin filament has 
been elucidated by image reconstruction.” ° In trans- 
mission electron micrographs, thin filaments appear to 
be about 8nm wide. Microtubules are hollow cylin- 
ders,’ constructed from the globular protein tubulin. 
They are about 20 nm in width. Intermediate filaments 
are the third class of filament. In transmission electron 
micrographs, they appear to be about 10 nm wide, inter- 
mediate between thin filaments and microtubules. 

Intermediate filaments were originally considered 
to be a heterogeneous class of polymeric proteins 
grouped together only because they were similar in 
width. Within this class are tonofilaments, neurofila- 
ments, cellular keratin filaments, desmin filaments, glial 
filaments, and vimentin filaments. Each of these sub- 
classes occurs in a different set of tissues, and they form 
intermediate filaments that often seem quite different in 
their appearance and their distribution through a cell 
(Figure 9-35).° It is now known, however, that all of 
these filaments are constructed from polypeptides that 
are homologous in sequence (Figure 9-36)” and thus 
necessarily share a common, superposable structure. 
That an intermediate filament can be constructed from 
one of these polypeptides all by itself has been demon- 
strated by reassembling filaments from a pure homoge- 
neous preparation of a given polypeptide.“ These 
polypeptides form helical cables of indefinite length. 

One of these intermediate filaments, keratin, com- 
poses the fibers in the composite material that forms 
skin, hair, and horn. For example, the quill of a porcupine 
represents a large array of more or less aligned keratin 
cables. Diffraction of X-radiation from such specimens*” 
provides dimensions of the helical arrays in these 
cables.” Meridional reflections representing a repeat of 
0.51 nm are strong features of such diffraction patterns, 
and this demonstrates that these cables are built from 
coiled coils of ahelices. 

The strand from which an intermediate filament is 
constructed is a polypeptide folded into an «œ helix. Two 
a helices twist around each other in a left-handed coiled 
coil (Figure 6-29) to produce the rope.” The heptad 
repeat permitting the formation of this rope can be 
noticed in those regions of the sequences that are 
involved in its formation when sequences from several of 
these polypeptides are aligned (Figure 9-36). The 
amino acids in the heptad positions at which the side 
chains are directed into the core of the coiled coil are not 
always nonpolar, but at four of the positions where they 
are not, remarkable conservation is displayed. 

As in collagen, the sequences producing the rope 
are found in the central portions of the polypeptides. In 
each of the amino acid sequences from this central 
region, there are three consecutive segments of heptad 
repeat, about 30, 100, and 130 aa in length,*” separated 
by short segments (about 20 aa in length) lacking the 


human keratin II 5 219 
ovine keratin II 7c 158 
ovine keratin I 8cl 106 
human keratin I 14 164 
desmin from G. gallus 146 
vimentin from C. griseus 148 
murine glial fibril 113 


porcine neurofilament-M 150 
human neurofilament-H 146 


human keratin II 5 270 
ovine keratin II 7c 209 
ovine keratin I 8cl 157 
human keratin I 14 215 
desmin from G. gallus 197 
vimentin from C. griseus 199 
murine glial fibril 164 


porcine neurofilament-M 201 
human neurofilament-H 197 
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Figure 9-35: Distribution of intermediate 
filaments of vimentin (A) and keratin (B) in 
animal cells.” Hamster Nil-8 cells (A) or 
kangaroo rat PtK-2 cells (B) were grown on 
glass cover slips. The cells were fixed with 
methanol, rinsed with acetone, and dried in 
the air. Antiserum raised in guinea pigs to 
either vimentin (A) or epidermal prekeratin 
(B) was then applied to the respective cells. 
After the antiserum was rinsed away, those 
antibodies bound to the intermediate fila- 
ments could be visualized by adding fluores- 
cein-labeled immunoglobulins specific for 
guinea pig immunoglobulin. The covalently 
attached fluorescein causes the intermediate 
filaments to which it is bound to fluoresce. 
Reprinted with permission from ref 302. 
Copyright 1978 National Academy of 
Sciences. 
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Figure 9-36: Alignment of portions of the amino acid sequences of the polypeptides composing various intermediate filaments.* The 
aligned segments come from different locations in the amino acid sequences of the various polypeptides. The numbers indicate the sequence 
positions in the various polypeptides at which the alignment on that line commences. The proteins are isoform 5 of human type II cytoskele- 
tal keratin, isoform 7c of ovine type II microfibrillar keratin, isoform 8c1 of type I keratin from intermediate filaments of ovine wool, isoform 
14 of human type I cytoskeletal keratin, desmin from Gallus gallus, vimentin from Cricetulus griseus, glial fibrillary acidic protein from murine 
astrocytes, triplet M protein from porcine neurofilaments, and triplet H protein from human neurofilaments. The pattern of the heptad 
repeat is highlighted in boldface type. The highlighted amino acids are those the side chains of which are directed into the cores of the coiled 


coils. 

repeat. The approximately 300 aa strand” should pro- trude from its sides. These protrusions presumably cause 
duce an interrupted rope 35-45 nm long. On either side each type of intermediate filament to have a different 
of this rope are amino-terminal domains (100-200 aa) width and a different tissue-specific function. 

and carboxy-terminal domains (100-1600 aa) that must An intermediate filament is a cable formed from 


either be incorporated into the body of the cable or pro- these ropes. The filament is a helical polymeric protein 


508 Symmetry 


that has a (-1, 3) surface lattice. In the cables of keratin in 
a fiber of wool, the rise for each step of the left-handed 
single-stranded helix generating the structure is 
6.7 nm,’ and each step is -111° from the preceding 
one.’ The measured mass of protein in each nanometer 
of an intermediate filament indicates that its cross sec- 
tion contains about 30 strands of @-helical polypeptide?!’ 
and that each rope is a coiled coil of two parallel 
ahelices. Adjacent ropes are antiparallel to each other 
and staggered, and in the resulting staggered pattern 
there are two different alignments, one in which the 
ropes are staggered by half their length and the other in 
which they are side by side, unstaggered.*°’°” The pre- 
cise arrangement of the ropes within the helical lattice of 
the cable, however, is as yet unknown. 

In all of the cables that have been discussed, the 
faces and interfaces can be divided into three groups. 
There is a continuous interface between or among the 
strands, containing within itself the central axis of 
the rope and twisting around that axis with the twist of 
the rope. Between the ropes in the cable there are inter- 
faces, but they are formed from faces on each rope com- 
posed of small regions of surface on each strand 
alternating between or among the strands as the rope is 
ascended (Figure 9-33). The cables themselves may have 
faces on them to promote side to side associations, and 
these faces are composed of small regions of surface on 
strands from the same or different ropes that are encoun- 
tered in turn as the cable is ascended (Figure 9-34). 

There is a class of polymers that form extracellularly 
during the progress of systemic amyloidosis, maturity- 
onset diabetes, Alzheimer’s disease, and spongiform 
encephalopathy, which are diseases affecting mammals. 
The diffraction of X-radiation by these fibers shows 
reflections arising from a repeat of 0.48 nm along the axis 
of the fiber.*!**"° This dimension is the spacing between 
the strands of a continuous ß sheet (Figure 6-9) with 
individual ß strands perpendicular to the axis of the fiber. 
The sheets in these polymers can be either parallel 
B structure?" or antiparallel £ structure." The length of 
each strand in one of the continuous f sheets forming 
these polymers varies from 2.5nm (7 aa) to 3.5 nm 
(10 aa) depending on the protein forming the fiber. This 
length determines the width of the ribbon formed by 
each of the continuous ß sheets. Several of these ribbons 
of continuous Bsheet are packed against each other 
about 1 nm apart, a spacing determined by the interdig- 
itations of the side chains (Figure 6-32) to create a fibril 
about 3-4 nm in width. The $ sheets can twist as they 
usually would (Figure 6-9) to give a helical twist to the 
ribbons that are packed together,’ and hence to the 
fibril, or they can be untwisted.°'® 
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The oligomeric proteins described so far, with a few 
exceptions, have been homooligomers of identical sub- 
units. There are many proteins in which the subunits are 
not identical to each other but are homologous, such as 
the wand ß subunits of hemoglobin or the VP1, VP2, and 
VP3 subunits of the protein coat of poliovirus (Figure 
9-28). In such cases, the homologous subunits are 
arrayed around rotational axes of pseudosymmetry that 
are the descendants of the rotational axes of symmetry of 
their homooligomeric ancestral proteins. For example, 
there are seven distinct B subunits, each with a unique 
sequence and each present in two copies, in the 20S mul- 
ticatalytic endopeptidase complex from S. cerevisiae. 
Although they have distinct sequences, their tertiary 
structures are all homologous and the 14 $ subunits in 
the oligomer are arranged at the center of the protein 
with dihedral pseudosymmetry of point group 722(D,).°"° 
In the homologous complex from Thermoplasma aci- 
dophilum, representing the ancestor of the complex 
from S. cerevisiae, the 14 B subunits are all identical to 
each other, homologous to those from S. cerevisiae, and 
also arranged with the same dihedral symmetry.''° Often 
the homologous subunits in a heterooligomer are inter- 
changeable with each other, as are those in fructose-bis- 
phosphate aldolase (Figure 8-18) or those in the dimer of 
creatine kinase.*'’ In all of these uncertain het- 
erooligomers, there is no significant difference between 
the pseudosymmetric arrangement of their subunits 
and the symmetric arrangement of the identical subunits 
in their homooligomeric siblings or homooligomeric 
ancestors. 

There are also heterooligomeric proteins formed 
from two or three distinct, unrelated subunits each pres- 
ent in equal numbers of copies and held together by het- 
erologous interfaces. A heterologous association is the 
association between two folded polypeptides of unre- 
lated sequence and unrelated tertiary structure. A het- 
erologous interface is the interface between two 
unrelated subunits in heterologous association. The het- 
erologous association between two unrelated subunits 
produces the protomer of aspartate carbamoyltrans- 
ferase. 

Aspartate carbamoyltransferase (Figure 9-37A)*"° is 
a hexamer with dihedral symmetry of point group 
322(D,) in which each protomer contains two different 
subunits, a catalytic o subunit and a regulatory D sub- 
unit. There are three types of interfaces holding the pro- 
tein together, six interfaces among the six a subunits of 
the two trimers, three interfaces between each of the 
three pairs of D subunits, and six heterologous interfaces, 
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one between each a subunit and each D subunit within 
each of the six protomers. The trimers of o subunits have 


only limited contacts with each other. 


In a molecule of aspartate carbamoyltransferase, 


each o subunit is associated with a p subunit at a para- 


digmatic heterologous interface (Figure 9-37B).*'” One 


face of the interface is formed from two long loops of 


random meander from the D subunit stitched in place by 


a Zn”* ion (gray sphere in Figure 9-37B). These two loops 


are surrounded by a complementary face created from 
six segments of the folded o polypeptide: the amino- 


terminal end of an a helix (@87 to 089); three loops of 
random meander and D turn (108 to «114, «130 to «133, 


and @163 to a174), 


each between the carboxy-terminal 


terminal end of an 


æ helix; and two loops of random meander («190 to a197 


end of a Bstrand and the amino 


each between two chelices. 
this interface serves as an example of the 


and 0234 to 0236), 
Consequently, 


fact that most intersubunit interfaces, both homologous 


and heterologous, 


are formed from loops of random 


meander and $ turns on the surfaces of the respective 
subunits, while interfaces within a subunit are usually 
between œ helices, B sheets, or an œ helix and a p sheet 
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(Figures 6-23 through 6-35) in its interior. Because it 
incorporates a lysine, a histidine, and four glutamates, 
this heterologous interface in aspartate carbamoyltrans- 
ferase also reemphasizes the fact that interfaces between 
subunits incorporate twice as many charged side chains 
as do interfaces between secondary structures within a 
subunit. 

The heterologous interface between an a subunit 
and a Bsubunit of aspartate carbamoyltransferase is 
formed from two continuous faces, one on the surface of 
each subunit, that fit together as a casting in a mold. The 


heterologous interface between elongation factor Tu and 
elongation factor Ts from E. coli, however, incorporates 
27 amino acids from the latter protein and 22 amino 
acids from the former, but they are situated in four sepa- 
rate clusters, one of which is formed by the carboxy-ter- 
minal a helix of elongation factor Ts that reaches over to 
sit in a groove on the surface of elongation factor Tu.” 

In ribulose-bisphosphate carboxylase from 
Spinacia oleracea (Figure 9-38),"'**' heterologous inter- 
faces between its constituent monomeric ß subunits and 
œ dimers hold together the (a),$;complex around the 
rotational axes of its dihedral symmetry of point group 
422(D,). Unlike the situations in aspartate carbamoyl- 
transferase and the complex of elongation factors Tu and 
Ts, each ß subunit connects two a subunits from two dif- 
ferent œ dimers around the 4-fold rotational axis of sym- 
metry. Consequently, each 8 subunit has two distinct 
faces on its surface, one that forms a heterologous inter- 
face with one asubunit from one dimer and one that 
forms a different heterologous interface with another 
a subunit from another dimer. The majority of each of 
these two distinct faces on each ß subunit is formed by 
two long loops of random meander sandwiched between 
the two respective œ subunits (Figure 9-38). The loop 
that is most detached from the 6 subunit and closest to 
the center of the heterooligomer has a conserved 
sequence and is required for proper assembly of the 
complete protein.’ One side of each of these two loops 
associates with one of the o subunits, and the other side 
of each of these two loops, which necessarily is com- 
pletely different, associates with the other of the a sub- 
units. The four respective copies of the two different 
interfaces alternate with each other around the 4-fold 
rotational axis of symmetry. The protein from 
Rhodospirillum rubrum, which lacks the gene for the 
B subunit, is an œ dämer 27 

Steric exclusion and mismatched symmetry are 
complications that can arise whenever an oligomer con- 
tains both heterologous associations between different, 
unrelated subunits and molecular rotational axes of sym- 
metry relating subunits homooligomerically. Steric 
exclusion is the blocking of a face for heterologous asso- 
ciation on one subunit of a homooligomer by the associ- 
ation of one of the heterologous subunits with the copy 
of that face on another subunit of the homooligomer. For 
example, the binding of a face on the extracellular 
domain of the immunoglobulin € receptor to one of the 
two identical, symmetrically displayed, complementary 
faces on the homodimeric Fc domains of immuno- 
globulin E sterically blocks the other face on that mole- 
cule of immunoglobulin E from associating with another 
molecule of the receptor.*”* 

Transthyretin is an (&),tetramer with dihedral 
symmetry. Its sole function is to bind retinol-binding 
protein. The four symmetrically arrayed faces for associ- 
ation with retinol-binding protein are in two adjacent 
pairs situated on opposite sides of transthyretin. The two 


identical faces in each adjacent pair, however, are too 
close to the 2-fold rotational axis of symmetry around 
which their two respective subunits are arrayed for each 
to bind to a retinol-binding protein simultaneously. 
When a retinol-binding protein binds to one face of a 
pair on one side of transthyretin, the other face of that 
pair cannot bind another retinol-binding protein 
because there is not enough room for it.” Because of 
this steric exclusion, only two of the four identical faces 
for associating with retinol-binding protein, one on each 
side of transthyretin, can be occupied at a time, and the 
complex is a closed ß(0,);ß heterohexamer. During the 
evolution of this heterologous association, the face for 
binding retinol-binding protein just happened to arise 
on the surface of a subunit of transthyretin at this site. 
The realization of the resulting heterologous interface 
accomplished the function of the protein even though to 
an intelligent designer it has been accomplished inele- 
gantly. 

Mismatched symmetry occurs when the rotational 
axis of symmetry relating two or more identical faces on 
one oligomer is of a different fold from or does not coin- 
cide with the rotational axis of symmetry relating the two 
or more identical complementary faces on a different 
oligomer during the formation of the heterologous asso- 
ciation between the two oligomers. 

The 2-oxoglutarate dehydrogenase complex pro- 
vides an example of mismatched symmetry. It is the 
multienzymatic complex responsible for the oxidative 
decarboxylation of 2-oxoglutarate. Three different pro- 
teins combine together in the complex to accomplish 
this oxidative decarboxylation: dihydrolipoyllysine- 
residue succinyltransferase, oxoglutarate dehydrogenase 
(succinyl-transferring), and dihydrolipoyl dehydroge- 
nase. The octahedral core of the complex is formed from 
the 24 identical subunits of dihydrolipolipoyllysine- 
residue succinyltransferase arranged in octahedral 
symmetry. Oxoglutarate dehydrogenase (succinyl-trans- 
ferring) is a symmetric homodimer, and dimers of this 
protein adorn the outer surface of the central octahedral 
core. They must each be attached to the core through 
heterologous interfaces formed between a face on the 
core and a face on the respective dimer. Because each of 
the dimers is built around a 2-fold rotational axis of sym- 
metry, it necessarily has two identical faces, each com- 
plementary to a face on the core. Because the core is 
octahedral, it necessarily has 24 identical faces, each 
complementary to a face on a dimer of oxoglutarate 
dehydrogenase (succinyl-transferring). The two identical 
faces on each dimer are arrayed about its 2-fold rota- 
tional axis of symmetry; the 24 faces on the core are each 
arrayed about its octahedral rotational axes of symmetry. 

Careful examination of electron micrographs of 
complexes containing one tetracosamer of dihy- 
drolipoyllysine-residue succinyltransferase but only two 
dimers of oxoglutarate dehydrogenase (succinyl-trans- 
ferring)” showed that a dimer of the dehydrogenase is 
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bound to a site on the succinyltransferase midway 
between one of its 2-fold rotational axes of symmetry and 
one of its 4-fold rotational axes of symmetry (Figure 
9-21B). Therefore only one interface, formed from one 
face on the dimer and one face on the core, can attach 
each molecule of oxoglutarate dehydrogenase (succinyl- 
transferring) to the core because if the two identical faces 
on the dimer of oxoglutarate dehydrogenase (succinyl- 
transferring) were both occupied with complementary 
faces on the dihydrolipoyllysine-residue succinyltrans- 
ferase, the one 2-fold rotational axis of symmetry of the 
dehydrogenase would have to coincide with one of the 
2-fold rotational axes of symmetry of the succinyltrans- 
ferase rather than being displaced from it. Only in this 
arrangement would the two identical faces on the oxo- 
glutarate dehydrogenase (succinyl-transferring) be able 
to associate with two complementary faces on the 
dihydrolipoyllysine-residue succinyltransferase. The 
symmetry-related face on an attached dimer ofthe dehy- 
drogenase finds itself empty because it is sterically inac- 
cessible to a complementary face on another subunit of 
the succinyltransferase. This arrangement is a result of 
the fact that, as might be expected, the complementary 
faces on the heterologous partners, the octahedral core 
and the dimer, happened to evolve with no concern for 
their positions relative to the axes of symmetry. 
Consequently, their symmetries were mismatched. 

In addition to this mismatch of symmetry, the 
complex between dihydrolipoyllysine-residue succinyl- 
transferase and oxoglutarate dehydrogenase (succinyl- 
transferring) also displays steric exclusion. For each of 
the 24 equivalent faces on the octahedral tetracosamer of 
dihydrolipoyllysine-residue succinyltransferase that is 
occupied by a dimer of oxoglutarate dehydrogenase 
(succinyl-transferring), the three other identical faces 
arrayed around the respective 4-fold rotational axis of 
symmetry (Figure 9-21B) remain empty’ because the 
complementary face on the octahedral succinyltrans- 
ferase to which a face on the dimer of the dehydrogenase 
attaches is too close to a 4-fold rotational axis of symme- 
try and the dimer is too large to permit more than one 
dimer to bind to the four identical faces around this axis. 
In the saturated complex between oxoglutarate dehydro- 
genase (succinyl-transferring) and dihydrolipoyllysine- 
residue succinyltransferase, there are six heterologous 
interfaces joining 12 polypeptides of the former and 24 
polypeptides of the latter. These nonstoichiometric 
ratios of subunits are dictated by both mismatched sym- 
metry and steric exclusion. Because there are only six 
heterologous interfaces connecting these two proteins 
together, only six of the 12 subunits of the dehydrogen- 
ase are directly attached to the succinyltransferase, and 
18 of the faces on the succinyltransferase and six of the 
complementary faces on the dimers of the dehydroge- 
nase remain unoccupied. 

There are other examples within the family of oxo- 
acid dehydrogenase complexes in which the stoichiome- 
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try between two different oligomers and the number of 
interfaces formed in the heterologous complex between 
the two are dictated by steric exclusion. In the pyruvate 
dehydrogenase complex from Bacillus stearother- 


mophilus, only one of the two symmetrically displayed 
faces on dimeric dihydrolipoyl dehydrogenase is able to 
associate with the icosahedral dihydrolipoyllysine- 
residue acetyltransferase because these faces are located 
too close to the 2-fold rotational axis of symmetry on the 
dimer to accommodate two of the complementary faces 
on dihydrolipoyllysine-residue acetyltransferase simul- 
taneously.” In the pyruvate dehydrogenase complex 
from S. cerevisiae, only one molecule of E3-binding pro- 
tein can bind to each pentagonal face of the icosahedral 
dihydrolipoyllysine-residue acetyltransferase even 
though each pentagonal face must have five identical 
sites for associating with E;-binding protein.” That this 
low stoichiometry results from steric exclusion follows 
from the fact that when the pentagonal face is made less 
crowded by truncating the dihydrolipoyllysine-residue 
acetyltransferase, more molecules of E;-binding protein 
can associate with it. 

The stoichiometry of the heterologous association 
of one homooligomer with another can also be affected 
by conformational changes resulting from the associa- 
tion itself. Only one low-affinity immunoglobulin 
y Fc region receptor can associate with the symmetrically 
dimeric Fc fragment of immunoglobulin G. Its associa- 
tion induces an asymmetric conformational change in 
the Fc fragment. This conformational change distorts the 
other face on the Fc fragment so that it cannot assume 
the conformation required to associate with a receptor 
even though it is not sterically blocked from doing so by 
the already bound molecule of the receptor.” 

Most heterooligomeric proteins display no dis- 
cernible symmetry, but often there are vestiges of sym- 
metry, such as the rough 2-fold rotational axis of 
pseudosymmetry in the complex between growth hor- 
mone and its receptor*°**! and the local 2-fold rotational 
axis of pseudosymmetry in the complex between human 
HLA class I histocompatibility antigen A-2 and human 
T-cell receptor B7 (Figure 9-39).%” This latter complex 
contains four different proteins held together with sev- 
eral asymmetric and pseudosymmetric heterologous 
interfaces. There are the central heterologous interfaces 
between the first two domains of the histocompatibility 
antigen and the a and ß subunits of the T-cell receptor, a 
heterologous interface between ß, microglobulin and the 
a subunit of the histocompatibility antigen, and heterol- 
ogous interfaces between the o and fsubunits of the 
T-cell receptor. The a and p subunits of the T-cell receptor 
are homologous, and the interface between the two 
respective domains presented in the figure contains a 
2-fold rotational axis of pseudosymmetry. The two amino- 
terminal domains of the histocompatibility antigen arose 
from an internal duplication, and they are related by a 
2-fold rotational axis of pseudosymmetry. Both of these 
2-fold rotational axes of pseudosymmetry coincide so the 
lower half of the complex in Figure 9-39 has two halves, 
right and left, that are related by an overall 2-fold rota- 
tional axis of pseudosymmetry. 


Many heterooligomers are complexes between two 
proteins held together by transitory heterologous asso- 
ciations rather than permanent heterologous associa- 
tions. These transitory complexes dissociate and 
associate during the lifetimes of their constituent pro- 
teins as required by their function. For example, when 
3’,5’-cyclic AMP is bound by the regulatory D subunits of 
cyclic AMP-dependent protein kinase, the catalytic 
asubunits dissociate from (hem 1777 The complex 
between the HLA class I histocompatibility antigen A-2 
and the B7 isoform of the T-cell receptor forms only after 
the histocompatibility antigen has bound a short peptide 
of a sequence recognized by the T-cell receptor (Figure 
9-39). 

Many asymmetrical heterooligomers, both those 
that are permanent and those that are transitory, contain 
folded polypeptides composed of multiple copies of 
modular domains (Tables 7-7 and 7-8). For example, 
both the nidogen and the laminin in the permanent 
equimolar complex between these two proteins? con- 
tain multiple copies of EGF modular domains among 
others. Both the HLA class I histocompatibility antigen 
A-2 and the T-cell receptor are composed of internally 
repeating domains, most of which are immunoglobulin 
modular domains (Figure 9-39). 

One of the main functions of modular domains is to 
participate in heterologous associations.” For example, 
the carboxy-terminal EF hand of aactinin forms a com- 
plex with the seventh Z-repeat of titin in which the 
former (see Figure 7-17) wraps around the single o helix 
that comprises the latter.” The WW modular domain of 
dystrophin forms a complex with the proline-rich motif 
on ß-dystroglycan in which a segment of the latter 5 aa 
long lies in extended conformation within a complemen- 
tary groove on the former.” The fifth and sixth EF hands 
of thrombomodulin form a complex with the globular 
protein a-thrombin through an interface that is formed 
from two complementary faces, one including surfaces 
from both of the EF hands and the other a flat continu- 
ous surface on the a-thrombin.*” 

It is their lack of exact symmetry, their transitory 
nature, and their modularity that distinguishes asym- 
metric heterooligomers held together entirely by heterol- 
ogous associations from homooligomers. 

The interfaces producing heterologous associa- 
tions, either permanent ones between two unrelated 
subunits or transitory ones between two unrelated pro- 
teins, are as variable in their structure as the interfaces 
between two identical subunits. They can be almost 
planar interfaces between two flat faces, as is the inter- 
face between T-cell receptor and HLA class I histocom- 
patibility antigen (Figure 9-39); they can involve 
terminal segments from one subunit that embrace the 
body of the other subunit, as in the interface between the 
asubunit and fp subunit in nitrile hydratase from 
Rhodococcus;” or they can be a coiled coil of œ helices, 
one from each of the participants, as in the complex 
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between syntaxin-1A, synaptobrevin-II, and SNAP- 
25B.” In the interface between proteinG from 
Streptococcus and murine immunoglobulinG, three 
p strands from the immunoglobulin and four p strands 
from the protein G (each six amino acids long) form a 
continuous, paradigmatic antiparallel 8 sheet.” In the 
heterologous association between the interleukin-1 
receptor and interleukin-1ß, the three consecutive 
immunoglobulin modular domains of the receptor wrap 
around the interleukin, surrounding it on three sides.“ 
The leucine-rich repeats of ribonuclease inhibitor wrap 
around ribonuclease, also surrounding it on three 
sides,” as do 13 of the 18 HEAT repeats of karyo- 
pherin ß2 that wrap around GTP-binding nuclear protein 
RAN A9 

An examination?“ has been made of the composi- 
tion of heterologous interfaces that form transitorily 
between two proteins during the performance of their 
normal function. The fraction of the accessible surface 
area of such a transitory interface formed by nonpolar 
atoms (0.56) is somewhat lower than that in the inter- 
faces holding together permanent oligomers (0.65) and is 
indistinguishable from that on the solvent-accessible 
surface of a small globular protein (0.57). The fractions 
formed by polar but uncharged atoms (0.24) and polar 
and charged atoms (0.19) are consequently somewhat 
higher than those in the interfaces of permanent 
oligomers (0.22 and 0.13, respectively). The same eleva- 
tion in the fraction of arginine that is observed in the 
interfaces of permanent oligomers is also observed in the 
interfaces of transitory oligomers (0.10 of the amino 
acids in the interfaces) and the same decreases in aspar- 
tate and glutamate, but the transient interfaces also show 
a significant elevation in tyrosine (0.09), a hydrophilic 
hydrophobe, and significant decreases in valine (0.04) 
and leucine (0.05), both hydrophobic hydrophobes, rela- 
tive to the composition of permanent interfaces (0.05, 
0.07, and 0.11, respectively). An extreme example of the 
elevation in the fraction of charged side chains in tran- 
sitory interfaces is the interface between human HLA 
class I histocompatibility antigen Cw3 and killer cell 
immunoglobulin-like receptor 2DL2 in which there are 
five arginines, three aspartates, three glutamates, and 
two lysines.” 

The elevation in the polarity of transitory interfaces 
is in keeping with the fact that these interfaces are not 
permanent features; and consequently, the complemen- 
tary faces on the two proteins must be exposed to the 
aqueous phase through much of their lives. Within the 
transitory, heterologous interface between o-thrombin 
and thrombomodulin, however, most of the side chains 
are hydrocarbon. In this instance, the problem of the sol- 
ubility of the separated proteins is solved by surrounding 
the hydrophobic patch forming the face on -thrombin 
with a high density of positively charged side chains and 
the hydrophobic patch on thrombomodulin with a high 
density of negatively charged side chains.*”° Only five of 
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these charged side chains actually participate in the 
interface itself. 

A common strategy for heterologous association is 
the specific binding of a disordered, structureless seg- 
ment of polypeptide within one protein by a structured 
binding site on another. Upon the binding of the struc- 
tureless partner to the structured partner, the structure- 
less segment of polypeptide assumes a structure 
complementary to the structured site.” In this type of 
heterologous association, it is only the amino acid 
sequence of the disordered segment and not its confor- 
mation that is recognized by the structured binding site. 
Consequently, the structureless partner can be mim- 
icked by a synthetic peptide of the proper sequence. For 
example, the association between troponin I and tro- 
ponin C can be blocked by a synthetic peptide with a 
sequence only 12 aa in length from the center of tropo- 
nin 1.°°° In such a situation, each of the members of a set 
of overlapping synthetic peptides comprising the com- 
plete sequence of the protein providing the structureless 
partner for the interface can be assayed for its ability to 
inhibit the heterologous association of the intact protein 
in order to identify the segment that participates in the 
association.” 

Importin-ß associates with importin-a by binding 
the structureless, highly charged (50% arginine, lysine, 
aspartate, and glutamate) amino-terminal segment 
(40 aa) of importin-a. The amino-terminal 876 aa of 
importin-ß form 19 internally repeating, modular HEAT 
domains (46 aa), each consisting of a hairpin of two a he- 
lices. In the crystallographic molecular model, the 38 
ahelices of these 19 internally repeating, modular 
domains wrap in a spiral around a synthetic peptide of 
44 aa with the amino acid sequence of the amino-termi- 
nal segment of importin-a. The formation of the complex 
induces the carboxy-terminal 28 amino acids of the oth- 
erwise structureless synthetic peptide to form an «helix 
aligned with the 38 whelices from importin-ß,’” in a 
structure resembling a thick section from the trunk of a 
palm. In the natural heterologous association between 
importin-« and importin-ß, the amino-terminal segment 
of importin-a, represented by the synthetic peptide in 
the crystallographic molecular model, is inserted into the 
middle of the spiral formed by importin-ß to form a 
strong complex between the two proteins. 

There are a large number of examples of such het- 
erologous associations mediated by structureless 
sequences of amino acids. The structureless segment can 
be located anywhere in the amino acid sequence of a 
protein. Annexin II associates with protein p11 by bind- 
ing its amino-terminal 12 amino acids.” Tumor necro- 
sis factor receptor-associated factor 2 is a globular 
trimer, each subunit of which has a binding site for a 
sequence of six amino acids in the structureless carboxy- 
terminal portion of the CD40 tumor necrosis factor 
receptor.”' PDZ Domains can bind to a structureless 
sequence located either at the carboxy terminus of 


another protein or in a portion of polypeptide in the inte- 
rior of its sequence of amino acids that loops out from its 
surface.” Nuclear localization signals are short struc- 
tureless sequences (5-20 aa) containing several lysines 
and arginines*” on the surfaces of proteins destined for 
the nucleus of a cell. Such structureless segments are rec- 
ognized by being bound to a structured domain of the 
nuclear import factor karyopherin o, composed of 10 
internally repeated armadillo modular domains.*”’ The 
Eps15 homology domains are modular domains that 
bind short segments of polypeptide containing the 
sequence -Asn-Pro-Phe-** and in this way produce het- 
erologous associations between a protein containing an 
Eps15 homology domain and a protein containing a 
structureless segment of this sequence within its folded 
polypeptide. 

Historically, homooligomers with globular sub- 
units, all arranged around rotational axes of symmetry, 
accounted for most of the proteins initially purified and 
studied. There are several reasons for this fact. Most of 
the proteins present at high concentrations in the cyto- 
plasm and most of the proteins that have enzymatic 
activity, and hence for which there are obvious assays, 
are globular homooligomers. Globular homooligomers 
are also compact, sturdy, and resistant to degradation 
by endopeptidases. Consequently, globular, homo- 
oligomeric enzymes were the easiest proteins to purify. 

Proteins the sole function of which is to participate 
in heterologous associations are more difficult to purify. 
Such proteins are often assembled by those associations 
into large, heterogeneous polymeric matrices, and until 
recently, identifying the partners involved in a particular 
heterologous association has been difficult. Proteins the 
enzymatic activities of which are regulated by transient 
heterologous associations with other proteins or the 
substrates of which are other proteins are difficult to 
assay because several components must be mixed 
together in the proper ratio. Proteins that control the 
enzymatic activities of the classical enzymes are present 
in much lower concentrations than those enzymes. 
Nevertheless, it is proteins engaging in heterologous 
associations that form large macromolecular structures 
within the cell and between cells, that control the metab- 
olism carried out by the classical enzymes, and that reg- 
ulate the expression of genes. Recently, proteins involved 
in such functions have been purified, identified, and 
expressed in amounts high enough to be studied func- 
tionally and structurally. 

These new proteins (Table 9-4) are new only 
because they are present normally in low concentrations, 
are difficult to assay, or for some other reason are diffi- 
cult to purify. In addition to their novelty, however, they 
all seem to share the property of participating in heterol- 
ogous associations, either among their unrelated sub- 
units or with other proteins. Most of these new proteins 
are peculiar to eukaryotic cells and the tissues of multi- 
cellular organisms. Now that complete genomes are 


available for a number of eukaryotes, it has become clear 
that most of the proteins encoded by the genes in those 
genomes are new proteins. The examples listed in Table 
9-4 illustrate various properties of these new proteins. 

One of the most unfortunate features of these new 
proteins is the chaos of their nomenclature. Often the 
same protein from two different species of organisms will 
have completely unrelated names. Often the proteins are 
designated by a number that either derives from the ini- 
tial genetic screen or from the initial, invariably inaccu- 
rate estimate of the length of their constituent 
polypeptide by electrophoresis on gels cast in solutions 
of dodecyl sulfate. Often the heterologous subunits of 
one of these proteins will each have its own peculiar 
name and the complex between them has another, unre- 
lated name. One has the suspicion that such confusion is 
intended to discourage individuals outside the narrow 
field of investigators interested in one or the other of 
these proteins from learning about them or even realiz- 
ing that they are normal, unremarkable proteins. 

Most of the new proteins contain internally repeat- 
ing domains or widely distributed modular domains*™ 
or both of these types of domains. The sole function of 
many of the modular domains, such as the SH2 domain 
of proto-oncogene tyrosine-protein kinase ABLI, is to 
form heterologous associations with other proteins. 

Many of these proteins, such as laminin alßlyl, 
integrin a281, and guanine nucleotide binding protein 
G(s), are heterooligomers. The £1 and yl subunits of 
laminin are homologous, but the ol subunit is unrelated 
to the others. The subunits of the integrin are unrelated 
to each other, as are those of the guanine nucleotide 
binding protein. The heterologous associations between 
the subunits of integrin «281 are permanent, as are those 
between the subunits of laminin, but the heterologous 
associations between the a, ß, and ysubunits of guanine 
nucleotide binding protein G(s) are transitory, and they 
dissociate and reassociate during its normal operation. 

Almost all of these new proteins are participants in 
extensive, intricate networks of heterologous associa- 
tions among many proteins.’ Each of these networks 
is responsible for a global function such as the produc- 
tion of the extracellular matrix or controlling the growth 
and multiplication of the cell. Proteins such as SHC 
transforming protein 1 form heterologous associations 
with many different partners and act as hubs in these 
networks. Proteins such as gelsolin associate with only 
one or two proteins at the dead ends in a network, while 
proteins such as qactinin1 and guanine nucleotide 
binding protein G(s) link one protein to the next protein 
within the spokes radiating from the hubs. Some of these 
proteins can connect one network to another network. 
Integrin @281 connects the network of heterologous 
associations forming the extracellular matrix to the net- 
works for the cytoskeleton and for cellular regulation 
through protein kinases. Nucleoporin Nup214 connects 
the network of heterologous associations forming the 
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nuclear pore to networks for nuclear import and for cel- 
lular regulation. 

Many of these proteins are responsible for shifts in 
the steady state of the cell such as changes in metabolism 
or the initiation of growth. Consequently, they must 
detect changes and respond to change by altering the 
heterologous associations in which they participate. 
Proteins such as SHC transforming protein 1 and proto- 
oncogene tyrosine-protein kinase ABL1 form heterolo- 
gous associations with some proteins only when 
particular tyrosines on the surfaces of those proteins 
have been phosphorylated. In this way, they recognize 
changes produced by intracellular signalling. The het- 
erologous associations in which cyclin participates are 
permanent, but their status is transitory, changing sys- 
tematically as the cyclin is rapidly degraded and then 
more is synthesized. Protein kinases form specific, tran- 
sitory heterologous associations with their substrate pro- 
teins in order to phosphorylate specific serines, 
threonines, or tyrosines on their surfaces. 

Some of the heterologous associations, such as 
those between «actinin and actin or between laminin 
and nidogen, are exclusive. Others, such as those 
between ankyrin and various proteins embedded in the 
plasma membrane serving as sites of attachment for the 
cytoskeleton or those between the SH2 modular domains 
on proto-oncogene tyrosine-protein kinase ABLl or 
SHC transforming protein 1 and an array of proteins con- 
taining phosphorylated tyrosines signalling changes in 
the regulatory status of the cell, are promiscuous. 
Transcription initiation factor TFIID recognizes TATA 
boxes that precede many different genes and then initi- 
ates the assembly of the large complex of different pro- 
teins responsible for initiating transcription. 

Proteins involved in these networks of interactions 
responsible for particular functions are often identified 
and assigned a role on the basis of the heterologous asso- 
ciations themselves. There are several ways to detect a 
heterologous association between two proteins. A com- 
plex between two native proteins can be detected by 
their coelectrophoresis.*” A complex between two pro- 
teins can be immunoprecipitated with immunoglobulins 
specific for one of the two, and the fact that the other 
coprecipitates demonstrates the existence of the com- 
plex. Glutathione transferase can be fused to a protein 
during its expression, and any proteins that participate in 
heterologous associations with that protein can be iden- 
tified after isolation of the complex by affinity adsorption 
with a solid phase to which glutathione has been 
attached.*®” 

It is also possible to screen a library by phage dis- 
play*® for a cDNA or a gene encoding a protein that par- 
ticipates in a heterologous interaction with a protein of 
interest. Fragments of cDNA or genomic DNA are 
inserted at a particular position in the gene encoding the 
coat protein ol of the fl or M13 bacteriophage. A popu- 
lation of E. coliis then infected with these bacteriophage. 


Table 9-4: Examples of New Proteins 


protein 
(length of human version) 


E-cadherin?’ 
(728 aa) 


integrin a2ß1?°* 
(1152 aa, 778 aa) 


laminin ol BI ya” 
(3058 aa, 1765 aa, 1576 aa) 


vitronectin??° >82 


(459 aa) 


ankyrin 1388389 
(1880 aa) 


aactinin 13% 


(892 aa) 


gelsolin“ 


(755 aa) 


synaptotagmin I’ 


(422 aa) 


nucleolin*!®*!! 


(706 aa) 


stoichiometry of modular domains and 


subunits 


Ka 


aß 


apy 


Ka 


internal repeats 


cadherin modular (5) 
serine-rich modular (1) 


von Willebrand factor 
type A modular (2), 
cysteine-rich repeat (4), 
FG-GAP repeat (7) 


laminin G modular (5), 
EGF modular (41), 
laminin modular (2), 
laminin amino-terminal 
modular (3) 


hemopexin modular (2) 


ankyrin repeat (23), 
death modular (1) 


calponin modular (2), 
spectrin modular (4), 
calcium-binding EF-hand 
modular (2) 


gelsolin repeat (6) 
C2 modular (2) 


Asp/Glu-rich repeat (3), 
RNA-binding modular (4), 
nucleolin repeat (8) 


function 


cellular adhesion 


attaches cell to extracellular 


matrix 


extracellular matrix 


protein in serum and 
extracellular matrix 


cytoskeleton 


cytoskeleton 


sculpts actin 


controls traffic of synaptic 
vesicles 


nucleolar component?” 


heterologous associations 


integrin aEß7 en 
E-cadherin?" 


Collagen," laminin,’® 
368 


collagenase, filamin,” a 


integrin cytoplasmic associated protein 


B-catenin,** other molecules of 


369 skelemin, 


177 


actinin, 


chondroadherin,*” interstitial 
370 


receptor 1 for activated protein kinase Gr? paxillin,*” 


focal adhesion kinase,*” integrin-linked kinase, 


calnexin?” 


374 


nidogen,*” integrin a261, thrombospondin®” 


integrin aVß1, 9% 


proteoglycan,’*® 


plasminogen activator inhibitor type 1%% 


anion exchanger,” spectrin,**' Na*, K*-exchanging 


actin HD vinculin,3 9 titin 


actin, caspase-3** 


338,400,401 


neurexin, syntaxin,“® clathrin assembly protein 


chromatin,” preribosomal 


insulin receptor substrate I, 


particles," 
5 nucleophosmin‘*!® 


ATPase,” Na" channel,” peuroglian "71 CD44 antigen”? 


2409 


5 


915 


Anauruıkg 


nucleoporin Nup214*"’ 
(2090 aa) 


HLA type I histocompatibility 
antigen A 27207 
(341 aa, 99 aa) 


cyclin A2*4 


(432 aa) 


SHC transforming protein 1° 


(583 aa) 


proto-oncogene 
protein-tyrosine kinase 
ABL 138-441 

(1130 aa) 


myosin light chain kinase“? 
(1914 aa) 


CD45 protein-tyrosine 
phosphatase? 
(1281 aa) 


guanine nucleotide-binding 
protein Ciel" 
(394 aa, 340 aa, 75 aa) 


transcription initiation 
factor TFIID*” 
(339 aa) 


aß 


apy 


Pro/Ser/Thr-rich region (1), 
coiled coil (2), nucleoporin 
FG repeats (40) 


immunoglobulin modular (4) 


phosphotyrosine interaction 
domain (1), SH2 modular (1) 


eukaryotic protein kinase 
modular (1), 

SH3 modular (1), 

SH2 modular (1), 
Pro-rich domain (1) 


protein kinase modular (1), 
flbronectin type III modular (1), 


immunoglobulin C2 modular (1), 


myosin light chain kinase 
repeats, type I (5) and type II (6) 


flbronectin type III modular (2), 
tyrosine-protein-phosphatase 
modular (2) 


G-protein D WD-40 repeat (7) 


polyglutamine domain (1), 
transcription factor TFIID 
repeat (2) 


component of nuclear pore 


mediates immune response 


control of cell cycle 


intracellular signalling 


intracellular signalling, protein 
tyrosine kinase 


intracellular signalling, protein 
serine/threonine kinase 


intracellular signalling 


intracellular signalling 


other nucleoporins, mitogen-activated protein kinase,“ 
nuclear RNA export factor 1, CRM1 protein,*” 
CREB-binding protein“?! 


T-cell receptor?” 


cyclin-dependent kinase 2,” cell division control 


protein 2 homologue,’”° protein CDC20," 
p3 endonexin*”® 


activated receptors for growth factors, 3° 


proteins phosphorylated at tyrosine,“ Grb2 
protein,” mPAL protein,“ phosphotyrosine 
phosphatase-PEST,*** Gads protein“? 


protein substrates, proteins phosphorylated at 
tyrosine, EphB2 protein tyrosine kinase,“ 
Wiskott-Aldrich syndrome protein family member 
actin,“ Abl interactor protein 2b, proteins with 
proline-rich segments""%*? 


GE 


Calmodhaltn. 17 myosin regulatory light chain 2 


proteins phosphorylated at tyrosine, semaphorin 4D,*' 
protein CD2,*” protein p561ck*™* 


adenylate cyclase,’ B-adrenergic receptor"”* 


TATA box on DNA,®” transcription initiation 

factor TFIIA, #748 transcription initiation factor 
TFIIB, 946 negative cofactor 1,4 negative cofactor 
general transcription factor II-I*® 


2,182 


“As listed in the SwissProt data base (www.expasy.ch). Numbers in parentheses are the numbers of each type of domain in the protein. 
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Each of the resulting bacteriophage that carries an insert 
has pII coat proteins on its surface, all of which display 
the segment of amino acids encoded by that insert. 
Because the point of insertion was chosen to be within an 
exposed loop in the coat protein, the segment of amino 
acids displayed is accessible to the solution and capable 
of interacting with the protein of interest. The protein of 
interest is covalently attached to a solid phase to produce 
an affinity adsorbent, and bacteriophage that carry seg- 
ments of amino acids with which it associates can be 
purified by successive rounds of affinity adsorption.“™ 
The bacteriophage purified by the affinity adsorbent can 
be plated, plaques can then be individually replicated, 
and the particular inserts that they carry can be 
sequenced. The method, however, that has been used 
most widely to catalogue heterologous associations 
between large sets of proteins has been the yeast two- 
hybrid assay (Figure 9-40). 0*7 

At various points on the chromosomal DNA of 
S. cerevisiae, in the vicinities of the genes encoding 
enzymes involved in the metabolism of galactose, there 
are sequences of 17 base pairs that are recognized by the 
regulatory protein GAL4, which activates transcription of 
these genes when the cell is grown in the presence of 
galactose or an oligosaccharide containing galactose.””' 
These upstream activating sequences for galactose 
metabolism can be anywhere from 100 to 400 base pairs 
away from the sites at which transcription of the gene is 
initiated, and yet binding of regulatory protein GALA still 
activates transcription. 

Protein GAL4 contains a domain that binds to the 
upstream activating sequence in the DNA and a domain 
that acts at the site of initiation.“ These two domains are 
connected flexibly so that either the short or the long dis- 
tances between the binding site on the DNA and the site 
of initiation can be spanned. The domain of regulatory 
protein GAL4 that binds to the upstream activating 
sequence in the DNA is fused to protein X or a portion of 
protein X. If the domain of regulatory protein GAL4 that 
activates transcription is fused to protein Y or a portion 
of protein Y that normally forms a heterologous associa- 
tion with proteinX, the heterologous association 
between protein X and protein Y or the portions of these 
two proteins will usually be sufficient to position the acti- 
vating domain effectively when the DNA-binding 
domain associates with an activating sequence on the 
DNA. The activation is effective because the require- 
ments of the complex are so flexible that the only prop- 
erty required is a physical connection between the two 
domains of regulatory protein GAL4. Even a complex in 
which two other proteins form a noncovalent bridge that 
then connects protein X and protein Y is sufficient for 
activation.“ 

When f-galactosidase from E coli is inserted into 
the DNA of S. cerevisiae under the control of a site for the 
initiation of transcription normally activated by regula- 
tory protein GAL4, the normal version of regulatory 


activating 
domain 


en 
GAL1-lacZ 


DNA-binding 
domain 


GAL1-lacZ 


m 
GAL1-lacZ 


Figure 9-40: Yeast two-hybrid system for detecting heterologous 
associations IT (A) Regulatory protein GAL4 from S. cerevisiae 
binds to a palindromic sequence of 17 base pairs (the consensus 
sequence for which is shown)’ through its DNA-binding 
domain*” and then activates the transcription of a gene, the initi- 
ation site for which is from 100 to 400 base pairs away, through the 
action of its activating domain, which is thought to be flexibly teth- 
ered to the DNA-binding domain. When a portion of the GALI 
gene, which is normally activated by regulatory protein GAL4, is 
fused to the lacZ gene from E coli, which encodes ß-galactosidase, 
the activation of the GAL1 gene by regulatory protein GAL4 causes 
high levels of B-galactosidase to be produced by the yeast cell (as 
indicated by the arrow). (B) Protein X, the partners of which in nor- 
mally occurring heterologous associations are being sought, is 
fused to the DNA-binding domain of regulatory protein GALA. 
When this fusion protein is expressed alone in a cell that is lacking 
regulatory protein GALA, the GALI gene is not activated and 
B-galactosidase does not accumulate. (C) Protein Y, which partici- 
pates in a normal heterologous association with protein X, is fused 
to the activating domain of regulatory protein GAL4. When it is 
coexpressed with the fusion protein between the DNA-binding 
domain and protein X, protein X and protein Y associate with each 
other, the activating domain of regulatory protein GAL4 is reasso- 
ciated with its DNA-binding domain, the GAL] gene is activated, 
and the cell fills with B-galactosidase. 


protein GAL4 is replaced by the two pieces linked to 
protein X and protein, respectively, and the cell is 
grown on galactose, B-galactosidase, a protein for which 
there is an assay producing a bright blue color, will accu- 
mulate. Any colonies of cells containing two hybrid GAL4 
domains, the respective proteins X and Y of which asso- 
ciate with each other, become blue when the assay is per- 
formed, but any colonies containing hybrid GAL4 
domains the proteins X and Y of which do not associate 
do not turn blue. 

The yeast two-hybrid assay is performed by choos- 


ing one protein or a portion of one protein and fusing its 
complementary DNA in phase with the DNA encoding 
the domain of regulatory protein GAL4 that binds to the 
DNA. This acts as the bait. It is then possible to fuse DNA 
encoding the domain responsible for activation to frag- 
ments of genomic DNA at random. Ifa protein that asso- 
ciates heterologously with the bait happens to be 
encoded by the DNA in one of these fragments, it will be 
caught, the colony containing that fragment will turn 
blue, and the DNA in the fragment can be sequenced to 
identify the protein it encodes.*” 

This assay has been automated and applied to dis- 
cover large numbers of heterologous associations. For 
example, in one of these large screenings, 957 heterolo- 
gous associations involving 1004 different proteins were 
identified.“ Usually, however, the fishing is for the part- 
ners of a particular protein of interest to the investigator. 
For example, when cDNA for human cyclin A (Table 9-4) 
was used as the bait and fragments of the yeast genome 
as the fish, three positive colonies were identified.“ One 
of the proteins identified in this way was cyclin-depend- 
ent kinase inhibitor 1, a protein already known to form 
complexes containing cyclin A, but the other two, protein 
CDC20 and £3 endonexin, represented novel associa- 
tions. Once candidates for heterologous association have 
been identified with the yeast two-hybrid system, the 
validity of the associations must be established in more 
extensive studies of the two isolated proteins, as was done 
for the associations between cyclin A and ß3 endonexin*”® 
and between cyclin A and protein CDC20,"”’ 

To establish the strength of the heterologous asso- 
ciation between two proteins, its dissociation constant 
can be measured. As with any measurement of a dissoci- 
ation constant, the molar concentration of the complex 
is followed as a function of the molar concentrations of 
the two unassociated proteins (Problem 5-7). The com- 
plex between the proto-oncogene protein c-fos and tran- 
scription factorAP-1 could be identified by the 
quenching of a fluorescent functional group on proto- 
oncogene protein c-fos by a fluorescent functional group 
on transcription factor AP-1, and a dissociation constant 
of 20 nM could be calculated from the changes in fluo- 
rescence as a function of the concentrations of the two 
proteins.*”° 

The heterologous interfaces between two proteins 
are often probed by cross-linking’’’*” or by site- 
directed mutation. To make sense of either of these 
types of experiments, a crystallographic molecular 
model of at least one of the participants must be avail- 
able. In the case of site-directed mutation, changes are 
made at particular sites on the surface of the model, and 
the effects of these mutations on the strength of the asso- 
ciation between the two proteins are assessed.*” It has 
also been possible to identify neighbors across the inter- 
face by discovering mutations in one of the proteins that 
compensate for mutations in the other protein that 
weaken the association by restrengthening it.*® 
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Such experiments often identify a cluster of amino 
acids on the surface of the crystallographic molecular 
model of the protein, and this cluster is then assumed to 
represent the face participating in the interface holding the 
two proteins together. The heterologous interface between 
human somatotropin and human somatotropin receptor 
was probed by site-directed mutation of the soma- 
totropin.“*'“” Sixty-six side chains, located consecutively 
in three segments of the overall amino acid sequence of 
somatotropin, were mutated one by one to alanine, and 
the effect of each of these mutations on the association 
between somatotropin and its receptor was quantified by 
measuring the dissociation constant of the complex. 
Fourteen of these mutations produced what were judged 
to be significant increases in the dissociation constant, and 
those 14 side chains were found to form a cluster on the 
surface of the crystallographic molecular model of soma- 
totropin, which was available at that time. In the crystallo- 
graphic molecular model of the complex between human 
somatotropin and its receptor that became available sub- 
sequently, seven of those 14 side chains were found to be 
located within one of the interfaces.” 

The difficulty in evaluating such sets of site- 
directed mutations is that often the change in the 
strength of the interaction produced by each individual 
mutation is not large,”’“** so a distinction between a 
mutation within the interface and one without the inter- 
face is difficult to make. When an amino acid critically 
involved in an interface is mutated, changes as large as 
500-fold in the dissociation constant have been 
observed,’ so it is difficult to evaluate changes of less 
than 10-fold. For example, those 14 mutations judged to 
have a significant effect on the association of soma- 
totropin with its receptor increased the dissociation con- 
stant by factors of only 4-20, while 14 of the mutations 
judged to be insignificant nevertheless increased the dis- 
sociation constant by factors of 2-3. The distinction 
appears to be arbitrary. Another problem is that during 
the formation of an interface of any kind, a conforma- 
tional change may be required to occur in a portion of 
one or the other of the proteins in order for the proper fit 
between the faces to be achieved. Any mutation that hin- 
ders this conformational change will disrupt the associa- 
tion even if it is not in a side chain that ends up within the 
interface. In the case of somatotropin, four of the muta- 
tions to side chains that are not incorporated into the 
interface, but which nevertheless did increase the disso- 
ciation constant between somatotropin and its receptor 
by factors of 4-15, are to side chains that are located in a 
segment of random meander in free somatotropin that 
becomes ordered upon its association with the receptor. 


Suggested Reading 
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Chapter 10 


Chemical Probes of Structure 


Although there are systematic programs to crystallize as 
many of the available proteins as possible, crystallo- 
graphic molecular models have been obtained so far for 
only a minority of the proteins that have been purified. 
This is not only a problem of time. There are a number of 
technical problems associated with the crystallographic 
method, and these have proved baffling in many 
instances. Often a purified protein has not been crystal- 
lized. Often the crystals of protein deteriorate too rapidly 
in the beam of X-rays. Often the space group is too com- 
plex to permit a ready solution. Often the crystals of a 
particular protein are microscopically disordered. Often 
usable isomorphous replacements have not been 
obtained. 

There are indications that many of these problems 
will be solved eventually. It seems that the extensive 
purifications required for proteins that are naturally 
present in low concentrations produces unnoticed 
alterations leading to undetected heterogeneities in the 
final preparation that interfere with crystallization. 
When such a protein is produced at high concentra- 
tions in an appropriate expression system so that only a 
few steps of purification are required, attempts to crys- 
tallize that protein become significantly more success- 
ful. Crystals of membrane-spanning proteins, a class of 
proteins that are quite difficult to crystallize, have been 
obtained more and more frequently and used success- 
fully for high-resolution crystallography. The develop- 
ment of charge-coupled detectors, which permit a 
complete data set to be gathered in a short period of 
time, has decreased the required length of exposure to 
the beam. Changing the conditions of crystallization’ or 
the species from which the protein has been purified’ 
can sometimes give crystals that have a different space 
group or are more ordered. As more reagents con- 
taining heavy metals become available, the odds 
against finding a suitable set of isomorphous replace- 
ments decrease. Yet it still seems possible that the 
majority of the proteins that have been or will be puri- 
fied may never yield high-resolution maps of electron 
density. 

In the absence of a map of electron density, the 
molecular structure of a protein is studied by a diverse 
collection of techniques. These approaches can be con- 
veniently divided into three classes: the use of chemical 
probes, the use of immunochemical probes, and the use 
of physical measurements. 


Covalent Modification 


Several types of amino acids in a molecule of protein are 
susceptible to covalent modification. As most of the reac- 
tive functional groups in a protein are nucleophiles, the 
reagents used to modify proteins are usually elec- 
trophiles. Serine and threonine contain nucleophilic 
oxygens, but they are indistinguishable from those of the 
water and cannot be easily modified. The amino acids 
possessing nucleophilic sites that can be conveniently 
modified are cysteines, methionines, lysines, histidines, 
tyrosines, glutamates, aspartates, arginines, and trypto- 
phans. The electrophilic reagents that are used to modify 
these amino acids couple the electrophile covalently to 
them. In the process, the ability of the amino acid to act 
as an acid, a base, or a donor or acceptor of a hydrogen 
bond is usually lost because an atom of the electrophile, 
usually a carbon, forms a covalent bond to the conjugate 
base of the central heteroatom. The proton that previ- 
ously occupied the conjugate acid of the conjugate base, 
which is the smallest possible atom, is replaced with the 
whole molecular structure of the electrophile, and this 
also increases the size of the side chain dramatically. 
After its modification, the amino acid is no longer able to 
participate in any particular role it might have had in the 
function of the protein and is no longer able to fit into the 
same space. Accordingly, the function or the structure of 
the protein or both is usually disrupted. 

Chemical modifications of amino acids in a protein 
are used for many different purposes. Most of the uses 
are designed imaginatively to answer a particular ques- 
tion about a particular protein, so it is impossible to give 
an exhaustive list of the reasons for covalently modifying 
amino acids in proteins. A few examples, however, will 
indicate why such experiments are so common. 

The most common purpose for using covalent 
modification is to demonstrate that a particular type of 
amino acid is involved in the function of the protein. 
For example, fibrinogen, upon activation, polymerizes 
to form long helical polymers that produce a clot. The 
initial polymerization is noncovalent, but the polymer is 
then strengthened by posttranslational cross-links. 
When the lysines of fibrinogen were modified by amidi- 
nation, the initial noncovalent polymer could form nor- 
mally, but the covalent cross-links could not.” This 
evidence was the basis for the prediction that the post- 
translational cross-links were amides between gluta- 
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mates and lysines. The most common use of covalent 
modification to study the function of a protein is the 
observation of the inactivation of an enzyme by cova- 
lent modification of amino acids in its active site. For 
example, the chemical modification of Lysine 116 in 
spinach ferredoxin-NADP* reductase inactivates the 
enzyme.’ 

There are, however, many other purposes for cova- 
lently modifying proteins. Covalent modification can be 
used to dissociate the subunits of a protein. For exam- 
ple, succinylation of its lysines caused the hemerythrin of 
Goldfingia gouldi to dissociate into its eight identical 
subunits.” Covalent modification can also be used to 
change the electrophoretic mobility of a protein by con- 
verting, for example, positively charged lysines into neg- 
atively charged carboxylates.° When such a modification 
is performed reversibly, the protein will travel with a dif- 
ferent electrophoretic mobility before the modification 
has been reversed than after it has been is reversed. 
Covalent modification of a protein can be used to pre- 
vent endopeptidolytic enzymes, for example, trypsin, 
from digesting that protein at particular amino acids, for 
example, arginine.’ Covalent modifications are also used 
to introduce foreign functional groups into proteins. 
For example, functional groups that absorb visible light® 
or have strong fluorescence’ may be introduced so that 
their spectral properties can be used in physical studies. 

When a protein is modified covalently, either the 
modification is performed under conditions that pro- 
duce a high yield of the desired product, and this is used 
in further experiments, or the chemical reaction itself 
between the protein and the reagent is monitored, and 
its kinetics are used to make arguments about the prop- 
erties of a particular amino acid in the native protein. For 
example, the dependence of the rate of the modification 
of a particular amino acid on the pH of the solution can 
be used to estimate the pK, of that amino acid in the 
native protein.'”!! Differences in rate constants for the 
reaction of amino acids of a particular type with the same 
electrophile can be used to assess differences in their 
accessibility to the solution in the native structure of the 
protein.” 

In all of these experiments, the possibility that the 
covalent modification itself disrupts the global confor- 
mation of the protein must always be kept in mind. If this 
happens, effects of the modification on the function of 
the protein might be attributed to local changes around 
the specific amino acid that has been modified when 
they actually result from the disruption of the entire 
structure of the protein. The modified protein should 
always be examined to rule out this possibility. For exam- 
ple, it could be shown by following electrophoretic 
mobility, sedimentation rate in the ultracentrifuge, and 
optical rotation that exhaustive amidination of the 
lysines in either serum albumin or immunoglobulin G 
had ng measurable effect on the structures of these pro- 
teins. 


In designing an experiment that involves the cova- 
lent modification of a protein, the usual desire is that the 
reagent chosen react with only one type of amino acid. 
Because cysteines, methionines, lysines, histidines, and 
tyrosines are similarly reactive nucleophiles, this is not a 
simple task. The issue of specificity is best addressed by 
examining the reaction of a simple alkylating agent, such 
as iodoacetamide, with the nucleophiles present in a 
protein. 

Iodoacetamide has been shown to alkylate cys- 
teines, lysines, histidines, and methionines.!f The four 
reactions are 
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When a protein is exposed to iodoacetamide, all four of 
these amino acids disappear from amino acid analyses of 
the reaction mixtures, at rates that depend on the pH,” 
and the carboxymethyl products’*!’ appear in concert. 
The first three reactions require that the amino acid be in 
the form of its conjugate base. 

The rate of the reaction between lysine and iodoac- 
etamide can be described by 


d| Lys 
an = -kı,, [iodoacetamide] [RH,NO] 


(10-5) 


where [Lys]or is the total concentration of unmodified 
lysine (both protonated, RNH;*, and unprotonated, 
RH,NO©) at a particular time, t. Substitution of the appro- 
priate terms into Equation 10-5 leads to 


d[ Lys Krys K, 
[Lys] por _ | male [iodoacetamide] [Lys] zor 


dt Katys + [H*] 


(10-6) 


where Kays is the acid dissociation constant for lysine. 
If the concentration of iodoacetamide is so large that 
it remains constant throughout the reaction and the 
pH does not change, Equation 10-6 describes a pseudo- 
first-order reaction. The pseudo-first-order rate con- 
stant, kiy» governing the disappearance of lysine with 
time is 


k K, 
Ei En (aii aos lea [iodoacetamide] (10-7) 
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and integrated from t=0 to t= t 
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(10-10) 
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where [Lys]o roris the initial concentration of unmodified 
lysine 


[Lys] ror t 


In[Lys]ror = | -Kyst (10-11) 


, 
Lys 
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It follows that 

In[ Lys] ror = m[Lys]o.ror - Ef (10-12) 
and 

[Lys]ror = [Lysloror ep (-k'iyst) (10-13) 


As in all first-order reactions, the rate constant Ei, is 
estimated either by fitting the disappearance of unmod- 
ified lysine to Equation 10-13 by nonlinear least-squares 
analysis or from the slope of the line obtained when 
In [Lys] zor is plotted against time. 

The other two of the first three reactions (Reactions 
10-1 through 10-3) have a formally equivalent mecha- 
nism, and the pseudo-first-order rate constants of each 
of them, keys and Es, are of the same form as kiys 
(Equation 10-7) with the appropriate rate constants, ky, 
or kyj,, and acid dissociation constants, Kacy; OF Kanis, sub- 
stituted for D, and Karys respectively. The variation in 
each of these rate constants can be presented graphically 
(Figure 10-1). 

At values of pH greater than the pK, of lysine 
([HĦ] < Karys), almost all of the primary amine is the con- 
jugate base, the rate constant kj, is equal to ky, 
(Equation 10-7), and the rate of the reaction of lysine 
with iodoacetamide is independent of pH. At values of 
pH below pKarys ([H"] > Karys), the concentration of the 
unprotonated conjugate base of lysine, and hence the 
rate of its reaction with iodoacetamide (Equation 10-7), 
decreases by a factor of 10 (1 logarithmic unit) for each 
decrease of 1 unit in pH (Figure 2-6). 

The rate of the reaction of the unprotonated conju- 
gate base of histidine with iodoacetamide, which is gov- 
erned solely by Eu, (Reaction 10-3), is correlated to the 
rate of the reaction of the unprotonated conjugate base 
of lysine with iodoacetamide, ky, (Reaction 10-2), 
through the Bronsted relationship: 


His aHis 
log = -Blog (10-14) 
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Figure 10-1: Variation with pH of the logarithm of the pseudo- 
first-order rate constants, k;, for the reaction between a fully acces- 
sible and unperturbed methionine, cysteine, histidine, or lysine in 
a protein with iodoacetamide. The first-order rate constants are on 
a relative scale with the rate constant of cysteinate anion arbitrar- 
ily assigned a value of 1.0. The individual lines were drawn accord- 
ing to the respective forms of Equation 10-7 with values of the pK, 
of 6.6 for histidine, 8.7 for cysteine, and 10.5 for lysine. The relative 
vertical positions of the lines were fixed by assuming that methio- 
nine reacts 10 times more slowly than cysteine at pH 5.5, histidine 
reacts 30 times more slowly than cysteine at pH 5.5, methionine 
reacts at the same rate at pH 5.5 and 8.5, lysine reacts 2.5 times 
more quickly than methionine at pH 8.5, and histidine reacts 2 
times more slowly than methionine at pH 8.5. The rate of the 
reaction with cysteine below pH 4 is assumed to be invariant with 
pH, as is the rate of the reaction with methionine, but with a rate 
constant less than that of methionine. 


In the case illustrated, below pH 8, histidine reacts more 
rapidly than lysine, while above pH 8, lysine reacts more 
rapidly than histidine. Such an inversion in reactivity 
occurs because f is usually between 1.0 and 0. If ß were 
0, the unprotonated conjugate bases of both histidine 
and lysine would react at equal rates with iodoac- 
etamide (ky; = Keel, and the curves for lysine and histi- 
dine would coincide at values of pH above the pK, of 
lysine. If ß were equal to 1.0 (Equation 10-14), the rate 
constants for the modification of both lysine, k7,,, and 
histidine, Kr, (Equation 10-7), would be equal at low pH 
and the lines for histidine and lysine in Figure 10-1 
would coincide at values of pH less than pKanis. In any 
case, at values of pH below the pK, of histidine, the 
pseudo-first-order rate constant, Kr, for its reaction 
with iodoacetamide decreases by a factor of 10 for each 
decrease of 1 unit in pH. At low pH, therefore, the rates 
of the reactions of both lysine and histidine with iodoac- 
etamide decrease in concert and remain in constant 
ratio to each other. 

The rate constant, kcys for the reaction of the 
unprotonated conjugate base of cysteine with iodoac- 
etamide is significantly greater than that of unproto- 
nated lysine. Because sulfur is an element from the third 
row and nitrogen is an element from the second row of 
the periodic table, cysteine is more nucleophilic than 


lysine, even though the pK, (8.7) associated with its lone 
pair of electrons is less than the pK, (10.5) associated 
with the lone pair of electrons on lysine.* With the appro- 
priate substitutions, Equation 10-7 governs the behavior 
of the rate of the reaction of cysteine with iodoacetamide 
as a function of pH at values of pH above and below the 
pk, of cysteine. Cysteine (Reaction 10-1), however, 
unlike lysine (Reaction 10-2) and histidine (Reaction 
10-3), retains two nucleophilic lone pairs of electrons 
after its protonation and can react with iodoacetamide as 
the conjugate acid in a reaction analogous to that of 
methionine (Reaction 10-4). 

The reaction of methionine with iodoacetamide is 
invariant with pH because its acid dissociation constant 
(pKamet = —9) is below accessible ranges of pH. The rate 
constant for the reaction of protonated cysteine with 
iodoacetamide should be lower than the rate constant, 
Eve, for the reaction of methionine because of hyper- 
conjugation. At low pH, the rate of the reaction of cys- 
teine with iodoacetamide should level off at a value lower 
than kye and also should become invariant with pH 
because the concentration of neutral cysteine is invari- 
ant with pH. 

The individual behavior of each of the reactions 
determines the specificity of iodoacetamide. At the 
lowest values of pH, methionine is the most reactive 
amino acid. As the pH is increased, into the range where 
the concentration of the thiolate anion becomes suffi- 
ciently large, cysteine becomes the most reactive amino 
acid at all higher values of pH. As the pH is increased, his- 
tidine becomes as reactive as methionine because the 
lone pair of electrons of its neutral base (pK,14;, = 6.6) is so 
much more basic than those of methionine (PKamet = Hl. 
At even higher values of pH, lysine becomes more reac- 
tive than histidine. All of these consequences determine 
which amino acid reacts most rapidly with iodoac- 
etamide at a particular pH. 

The reagent used to modify the amino acids of a 
protein may itself also be affected by alterations in pH. 
Methyl acetimidate is an imidoester that modifies lysines 
with high specificity (Figure 10-2).'° The specificity 
results in part from the fact that while the product of the 
reaction with lysine is a stable amidine, the analogous 
products with cysteine, methionine, glutamate, aspar- 
tate, tyrosine, and histidine are unstable derivatives of 
acetate and decompose as quickly as they are produced. 
The effect of pH on the rate of the reactions between 
amines and imidoesters has been explained mechanisti- 
cally (Figure 10-2)'* with the assumption that the reac- 
tive form of the lysine is the free base and the reactive 
form of the imidoester is the cationic conjugate acid. For 
methyl acetimidate,'? pKa; = 7.5, and the concentration 
of cationic imidoester should be decreasing as the pH is 


* The Bronsted relationship does not apply when the central atoms 
of the two acid-bases differ. 
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Figure 10-2: Mechanism for the reaction between the free base of 
lysine and the cationic conjugate acid of methyl acetimidate.'® The 
products of the modification can be either the amidine of one 
lysine, the acetamide of one lysine, or the amidine of two lysines. 
The latter reaction produces cross-links within the protein but 
rarely occurs. 


raised above pH 7, while that of the free base of lysine 
should be increasing. 

At the higher values of pH where K,ry has little 
effect, the rate of the reaction between lysine and methyl 
acetimidate is governed by the equation 


d[ Lys] ror _ 


z -kı [R'=NH,*][RH,N®] (10-15) 


from which it follows that 
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d[Lys] TOT _ 
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Eu Katys [H*] 


7 (Kanı + [H*]) (Karys + [H*]) 


[Al ror [Lys] ror 


(10-16) 


where [Al] 797 is the total concentration of methyl acet- 
imidate, both conjugate acid, R’=NH,", and conjugate 
base, R’=NH. 

If [Al]ror were high and constant throughout the 
course of the reaction and the pH did not change, 
Equation 10-16 would describe a pseudo-first-order 
reaction. The pseudo-first-order rate constant, Eat, gov- 
erning the disappearance of lysine with time would be 


p Ku Karys LH" Kar) 
Al" TOT 
(KA = [H*]) (Karys S [H*]) 
(10-17) 
When pK,ar < PKarys < PH 
k, [H+] 
= us Ki; (10-18) 


aAl 


and k^ would decrease by a factor of 10 for each increase 
of 1 unit in pH. Between the two acid dissociation con- 
stants, when DE, < pH < PKarys Equation 10-17 predicts 
that 


Eu Katys [ ] 
Ka 


k's = (10-19) 


and the rate of the reaction should be almost invariant 
with pH, and when pH < DE. < PKarys 


Eu Katys 


[H*] 


a= [AI] ror (10-20) 


and Ku should decrease by a factor of 10 for every 
decrease of 1 unit in pH. While the reaction of amines 
with ethyl benzimidate is more complicated than this 
simple picture, TD in the case of the reaction of methyl ace- 
timidate with the lysines in unfolded aldolase,” the 
plateau between DK, and PKarys is observed as predicted 
by Equation 10-19, and the decrease in the rate of amid- 
ination does not occur until below pH 8.0. 

The fact that the modification of a protein is usually 
performed in aqueous solution limits the reagents that 
can be used. On the one hand, problems of solubility of 
the electrophile often arise, thereby restricting its useful 
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range of concentrations. On the other hand, because 
water is itself a nucleophile, decomposition of the 
reagent through hydrolysis often occurs. Methyl acetim- 
idate, unlike iodoacetamide, reacts quite readily with 
water. Between pH 6.8 and 8.4, the rate constant for its 
hydrolysis at 20 °C is 0.02 min". 

When the reagent chosen for a particular modifica- 
tion decomposes rapidly, measurements of its rate of 
reaction with the protein are complicated by this decom- 
position. In the case of methyl acetimidate, the situation 
can be represented by the kinetic mechanism 


lysine + methyl acetimidate ——~ amidine 
1 
ky 


products of hydrolysis 
(10-21) 


In this situation, where hydrolysis is occurring coinci- 
dentally with modification, it can be shown” that 


k; [Alo ror 


Famidine =l- exp - EEN (10-22) 
k, 


where [Allo,ror is the initial total molar concentration of 
methyl acetimidate and famidine is the fraction of the lysine 
that has been modified when all of the methyl acetimi- 
date has been consumed either by reaction with lysine or 
by hydrolysis. From Equation 10-22 it follows that 


1 ky 
In = [Allo ror (10-23) 
2 


i= e amidine 


In effect, this relationship allows the reaction to be stud- 
ied more leisurely. A series of mixtures containing the 
protein are prepared with increasing concentrations of 
methyl acetimidate at constant temperature and pH. The 
reaction is allowed to reach completion. The various 
fractions of the lysine modified are assessed. From these 
results and Equation 10-23, estimates of the ratio k, (ky) 
can be determined. If the rate constant k, has been meas- 
ured in a separate experiment under identical condi- 
tions, k, can be obtained directly. 

Alkyl imidoesters, such as methyl acetimidate, 
react specifically and in high yield with lysine in proteins. 
In the case of myoglobin, for example, the two major 
products of the reaction with methyl acetimidate were 
proteins amidinated either at every primary amine 
except Lysine 77 or at every primary amine except the 
amino terminus.” An amidine (pK, = 12.5) is positively 
charged at pH 7, and as a result, no change in the charge 
of the protein occurs during the amidination of its 
lysines. The lysines, however, are no longer nucleophilic, 
and the size of the side chain has increased significantly. 


Another way to direct the modification exclusively 
to lysines is to take advantage of the fact that primary 
amines such as lysine are the only functional groups on 
a protein that react with aldehydes to form imines 
(Figure 10-3). The conjugate acid of the resulting imine 
can then be reduced with sodium borohydride or sodium 
cyanoborohydride to produce the secondary amine. 
Both formaldehyde” and pyridoxal 5’-phosphate” have 
been used as the aldehyde. The former is a more reactive 
aldehyde and produces much higher yields of alkylated 
lysine; the latter is more selective and under the proper 
conditions will modify only the most accessible and 
nucleophilic lysines in a protein. 

Isothiocyanates are also specific for the primary 
amino groups of lysines, as well as the amino terminus 
(Figure 3-1), of a protein: 
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The products of the modification are N,N’-dialkyl- 
thioureas. Presumably, isothiocyanates are specific for 
lysine because the products they would form with the 
other nucleophilic amino acids are unstable under the 
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Figure 10-3: Reaction of lysine with an aldehyde to form an 
iminium cation, which can be reduced to the secondary amine with 
sodium borohydride (NaBH,) or the less reactive sodium 
cyanoborohydride (NaCNBH;). If tritiated sodium borohydride 
(NaB°H,) or tritiated sodium cyanoborohydride (NaCNB°Hs) is 
used, tritium is incorporated into one of the alkyl carbons of the 
secondary amine. 


reaction conditions. A similar reaction occurs with iso- 
cyanates: 


Ht 
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N, N'-dialkylurea 
(10-25) 


The products are N,N’-dialkylureas. Alkyl and aryl iso- 
cyanates react with cysteine and tyrosine as well but pro- 
duce products that can be hydrolyzed back to the 
unmodified amino acids under alkaline conditions,” to 
leave only the lysines modified. 

A large collection of acylating agents react with 
lysine and also acylate other nucleophilic amino acids. 
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Figure 10-4: Acylation of lysine with any one of a number of acy- 
lating agents in which the acyl carbon is activated by attaching a 
good leaving group. In the tetravalent intermediate, the leaving 
group is expelled, in preference to the nitrogen of the lysine, to pro- 
duce the amide of the lysine. The activating groups that are used to 
produce the derivative of the carboxylic acid that is the acylating 
agent are the carboxylic acid itself to form the anhydride, 
N-hydroxysuccinimide to form the N-hydroxysuccinimide ester, 
azide anion to form the acyl azide, ethyl carbonic acid to form the 
acyl ethyl carbonate, imidazole to form the acyl imidazole, or a 
thiol to form the thioester. 
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The general reaction performed by these reagents is 
acyl transfer (Figure 10-4). As in synthetic organic 
chemistry, the appropriate reagent is chosen on the 
basis of its electrophilicity. The properties of the leaving 
group X determine both the electrophilicity and the rate 
at which the reagent will modify the lysines in a protein. 
Because the leaving group departs from a Lewis acid, 
the tetravalent intermediate, the tendency for the leav- 
ing group to depart from a proton will reflect its ten- 
dency to depart from the tetravalent intermediate 
(Figure 10-4). Therefore, the larger the acid dissociation 
constant Kac of the conjugate acid of the leaving group 
X, the more reactive will be the reagent. If the leaving 
group is the carboxylic acid itself, the reagent is 
an anhydride such as trifluoroacetic anhydride 
(pKarc = 0.2) or acetic anhydride” (PKarc = 4.8). Other 
leaving groups on acyl derivatives that have been used 
in the modification of lysine are azide” (pKatc = 4.7), 
N-hydroxysuccinimide” (PKac = 6.0), imidazole” 
(PKac = 7.0), ethyl carbonate” (Duc = 7), and 
ethanethiol” (PKarg = 10.5). 

All of these acylating agents react as readily with 
cysteine, tyrosine, and histidine as they do with lysine to 
form the respective S-, O-, or N-acyl derivatives. Unlike 
the S-, O-, or N-alkyl derivatives formed during the reac- 
tion with an alkylating reagent such as iodoacetamide 
(Reactions 10-1 through 10-4), these acyl derivatives of 
cysteine, tyrosine, and histidine are unstable and decom- 
pose spontaneously or can be decomposed intentionally 
under conditions that leave the lysines in the modified 
form. For example, O-acetyltyrosine can be hydrolyzed 
back to tyrosine by treatment with hydroxylamine.” 
Often an acylating agent the structure of which causes 
the undesired derivatives to be particularly unstable can 
be chosen. For example, the ethoxycarbonyl group is 
added to tyrosine and histidine as well as lysine when 
one uses the carbonic acid anhydride, diethyl pyrocar- 
bonate: 
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diethyl pyrocarbonate 
(10-26) 


The ethoxycarbonyl group, however, can be removed 
from the histidine and tyrosine by treatment of the mod- 
ified protein with hydroxylamine.” 

Cyclic anhydrides such as succinic anhydride 
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are frequently used to modify the lysines in a protein. 
They replace the positively charged primary ammonium 
cation of lysine with a negatively charged carboxylate. 
The acylation at lysine can be reversed to regenerate the 
lysine when maleic anhydride (2,3-dehydrosuccinic 
anhydride), citraconic anhydride,” or 3,4,5,6-tetrahy- 
drophthalic anhydride is used P 

Fluorosulfonic acids are general electrophilic 
reagents that modify lysine, by N-sulfonation, and tyro- 
sine, by O-sulfonation. The paradigm for these reagents 
is 5-(dimethylamino)naphthalene-1-sulfonyl fluoride 
(dansyl fluoride): 
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which is a fluorescent reagent for the covalent modifica- 
tion of proteins.” 

Both 2,4-dinitrofluorobenzene 
2,4,6-trinitrobenzenesulfonate (TNBS)®’ 


(FDNB) and 
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NO, 
(10-28) 


modify lysine by nucleophilic aromatic substitution. 
The former compound reacts with every nucleophilic 
amino acid.” The latter can be confined to react with 
only cysteine” and lysine by the proper choice of pH;* 
but the derivative formed with cysteine is unstable, so in 
the end only lysine is modified. In situations in which 
particular lysines in a protein are more nucleophilic 
than the others, modification of the protein even by as 
promiscuous a reagent as 2,4,6-trinitrobenzenesul- 
fonate can be confined to those few sites. For example, 
at pH 8.0, only Lysine 332 of the a subunit of ribulose- 
bisphosphate carboxylase from Spinacia oleracea reacts 
significantly with this otherwise nonspecific arylating 
reagent. 

As noted previously, alkylating agents such as 
iodoacetamide and other alkyl halides are electrophiles 
that react with every nucleophilic amino acid to yield 
stable products. Informed or uninformed manipulation 
of the pH can affect the distribution of alkylated prod- 
ucts. At low pH, alkylating agents can be used to produce 
stable derivatives of methionine specifically.'’ For exam- 
ple, at slightly acidic values of pH, benzyl bromide alky- 
lates methionine in fumarase quite selectively:*° 
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The reagent that has shown the greatest selectivity 
for histidine is diethyl pyrocarbonate (Equation 
10-26). Usually this selectivity is obtained by running 
the reaction at a pH slightly below the pK, for histidine, 
where the greatest discrimination in favor of histidine, 
relative to lysine (Equation 10-26), should be manifested 
(Figure 10-1). Histidine is also susceptible to photooxi- 
dation in the presence of dyes such as methylene blue or 
rose bengal.” Under carefully controlled conditions such 
photooxidation can be confined to histidine,*””* but usu- 
ally several other amino acids are destroyed simultane- 
ously.’ 

One of the most readily modified amino acids in a 
protein is cysteine. At slightly alkaline pH, in the vicinity 
of its pK, (Figure 10-1), cysteine is preferentially alky- 
lated by alkyl halides such as iodoacetamide and iodoac- 
etate (Equation 3-17), but one must remain aware of the 
fact that other nucleophilic amino acids can also be 
modified. N-Ethylmaleimide (NEM) is another reagent 
often used to modify cysteine:” 
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S[Nethylsuccinimidyl]cysteine 
(10-30) 


This reaction is an example of nucleophilic addition to an 
&ß-unsaturated acyl compound. 2-Vinylpyridine”’ is 
selective for modification of cysteine in a similar reac- 
tion. The specificity of these alkylating agents for cys- 
teine depends both upon the fact that sulfur is more 
nucleophilic than either nitrogen or oxygen and upon 
the use during the reaction of a pH just below the pK, of 
cysteine (Table 2-2) so that modification of lysine is sup- 
pressed (Figure 10-1). The possibility of alkylation at 
other nucleophiles such as lysine’! must always be con- 
sidered. For example, the inactivation of spinach ferre- 
doxin-NADP* reductase by N-ethylmaleimide results 
from alkylation of a lysine rather than a cysteine.* 

Organic mercurials, such as p-chloromercuriben- 
zoate (PCMB),”’ are usually specific for cysteine: 
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The sulfur-mercury bond, because it is a bond between 
a soft metal and a soft Lewis base, is significantly cova- 
lent and is particularly stable, but it is usually not stable 
enough to survive subsequent digestion of the protein 
and chromatography of the peptides. 
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Cysteine can be converted to S-(2-aminoethyl) 


cysteine by modification with aziridine (ethyl- 
eneimine) i 
Ht 
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The modified side chain is isosteric with the side chain of 
a lysine and has become a site for tryptic digestion.” 
5,5’-Dithiobis(2-nitrobenzoate) 


participates readily in disulfide interchange with a cys- 
teine (Figure 3-20). The reagent contains a disulfide 
that is particularly electrophilic“ because the nitro- 
thiobenzoate dianion is such a good leaving group 
(pK, < 5). This causes the equilibrium to lie in favor of 
mixed disulfides between the cysteines on the protein 
and 5-thio-2-nitrobenzoate. Unfortunately, these mixed 
disulfides are also electrophilic. Consequently, the pH 
of the solution should be well-buffered to prevent fur- 
ther reaction of the mixed disulfide with nucleophiles 
such as hydroxide ion. The nitrothiobenzoate dianion 
released during the disulfide interchange between cys- 
teine and 5,5’-dithiobis(2-nitrobenzoate) (10-2) is 
brightly colored (£42 = 13,600M" cm’), and its 
absorbance can be used to follow the reaction. The situ- 
ation is complicated, however, by the possibility of side 
reactions with other nucleophiles releasing extrastoi- 
chiometric nitrothiobenzoate that increases the 
absorbance of the solution and by the reaction of the 
nitrothiobenzoate itself with oxygen that decreases the 
absorbance of the solution. Well-buffered solutions 
should be used in the absence of atmospheric oxygen, 
and the absorbance should be measured continuously. 
The ease with which this reaction can be followed has 
led to its wide application to proteins. 

Tyrosine is frequently alkylated, acylated, arylated, 
or sulfonylated inadvertently during modification reac- 
tions designed to be restricted to lysines. It can be mod- 
ified specifically, however, by taking advantage of its 
elevated susceptibility to electrophilic aromatic substi- 
tution. As a p-alkylphenol, it is activated toward substi- 
tution, which is directed to its ortho positions by the 
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electron-releasing hydroxyl. A simple example of this 
susceptibility is the facile iodination of tyrosine: 


© -O 
la + O 
H + HI 


© e 
% — O 
| H 


(10-33) 


Iodide ion is used as the source of the iodine, and it is oxi- 
dized to I,, IOH, or ICI either chemically”® or enzymati- 
cally. Histidine is also iodinated under similar 
conditions but not so readily as tyrosine.”! A particularly 
advantageous method for activating I chemically uses 
N-chlorosulfamylphenyl groups covalently incorporated 
into a solid phase by the N-chlorosulfamylation of poly- 
styrene beads. The covalently attached N-chlorosul- 
famylphenyl groups produce ICI from T in a reaction 
identical to that performed by N-chlorobenzenesulfon- 


amide in free solution: 
O O 
+H* 
I + (Oo) == [CI + (Oé 
\ \ 


N N 
ci © i 
(10-34) 


The ICI iodinates tyrosines in the protein, and the attach- 
ment of the chlorinating agent to a solid phase inhibits its 
ability to chlorinate the protein. The most advantageous 
enzymatic method uses lactoperoxidase. This enzyme 
converts I and H,O, to OI and H,O. Either the OF remains 
bound to the iron of the heme on the enzyme, where it can 
iodinate a tyrosine on the surface of another protein that 
has been bound by the lactoperoxidase, or it is released as 
HOI, which can iodinate a tyrosine on the surface of 
another protein free in the solution.” Iodination is usu- 
ally used to introduce the radioactive isotope of iodine, 
1257, into the protein to make it radioactive. 

Diazonium salts also participate in the electro- 
philic aromatic substitution of tyrosine. 5-Diazonium- 
1-hydrotetrazole” is a diazonium salt producing a 
product with tyrosine that absorbs strongly at 550 nm:“ 


H N, H H 
4 
N—N Ni V 
d KL NS ye 
‘Ny SANO - H+ © 
H — 


(10-35) 


It reacts readily with histidine as well, but at low pH his- 
tidine will be mainly protonated, and the imidazolium 
cation is inert to electrophilic aromatic substitution. 

Tetranitromethane is the reagent used most fre- 
quently to modify tyrosine: 


ON. NO, 


D e ei 
C 


ON NO, 
10-3 


The nitration that produces the o-nitrotyrosine proceeds 
by a free radical mechanism.” The o-nitrotyrosine pro- 
duced absorbs strongly at 428 nm as the nitrophenolate 
anion. It can be reduced to o-aminotyrosine with 
dithionite. The hydroxyl of o-aminotyrosine has a 
uniquely low pK, (4.8), and this fact can be exploited to 
direct further modification to this location in the pro- 
tein.’ Unlike O-acylation and O-alkylation, which 
require the tyrosine to be anionic to react as a nucle- 
ophile, the reaction with tetranitromethane proceeds 
with the neutral tyrosine. 

Tryptophan is susceptible to electrophilic aromatic 
substitution because of its similarity to aniline. For 
example, tritium can be incorporated specifically into 
tryptophan in a protein under strongly acidic condi- 
tions: 


H 
3 
34+ N — H © 
N N 
H H 
3 
H 
—H+ \ 
N 
H 
(10-36) 


Sulfenyl halides such as 2,4-dinitrobenzenesulfenyl 
chloride participate in a formal electrophilic aromatic 
substitution at carbon 2 of tryptophan:”® 


Se 
N Cl \ S NO» 

d NO, N 

H 

+ — HCl 
NO, 
NO2 

(10-37) 


The most peculiar position in tryptophan is the 
x bond between carbons 2 and 3. This bond displays the 
properties of an olefin during bromination with mild 
brominating reagents (Equation 3-1) by participating in 
addition rather than substitution. Under mild conditions 
a relatively inert brominating agent, 2-[(2-nitrophenyl) 
sulfenyl]-3-methyl-3’-bromoindolenine (BNPS-skatole), 
oxidized the tryptophan in micrococcal nuclease to the 
oxindole,” presumably through an intermediate halo- 


hydrin: 
Br 
oo 
N OH 
NL ou N 

halohydrin 

H 

-HBr 

N 

H 

oxindole 

(10-38) 


Only methionine was oxidized at the same time, and it 
could be regenerated readily by reduction. The use of 
addition reactions to the olefin in tryptophan to incorpo- 
rate nucleophiles other than water might be feasible. The 
bromination, however, often results in cleavage of the 
polypeptide at the tryptophan (Equation 3-1). 

Arginine is modified specifically by vicinal diones: 


one 


OMO) 


10-4 


Reagents normally used are diphenylethanedione (R; = 
R, = phenyl), p-nitrophenylethanedione (H: = CsH,NO,; 
R, = H), 4-(oxoacetyl)phenoxyacetic acid (R) = 
C,H,OCH,COOH; R; = H), and 1,2-cyclohexanedione (R; 
= CH;CH;CH;CH; = R,). In all cases the initial product is 
the cyclic adduct 10-5”°® that then dehydrates: 


H © 
H Ady ug HN" 
OK — Pia => 
Die? OH X OH 
pee perk 
10-5 (10-39) 
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When the reagent is diphenylethanedione, this dehy- 
drated adduct is the final product;” when the reagent is 
p-nitrophenylethanedione° or 4-(oxoacetyl)phenoxy- 
acetic acid, the hydrogen (R,) permits a Cannizzaro 


rearrangement that produces iminoimidazolidone 
10-6" 
H H 
H 
H NAR: H N 
N DE NX 
N O N 
j® y "P 
10-6 10-7 


and when the reagent is cyclohexanedione, a similar 
rearrangement produces iminoimidazolidone 10-7. 

The modification of arginine by vicinal diones can 
also yield products that incorporate additional mole- 
cules of the dione. 2,3-Butanedione (10-4, R; = R; = CH3) 
under appropriate conditions self-condenses to dimers 
and trimers that both react with arginine to yield poorly 
characterized, heterogeneous mixtures of products con- 
taining about 3 mol of dione for every mole of arginine. 
Phenyl glyoxal (10-4, R; = phenyl, R = H) reacts with argi- 
nine to produce a product containing 2 mol of dione for 
every mole of arginine.’® 

The earliest modifications of arginine, with either 
diphenylethanedione™ or 1,2-cyclohexanedione,” were 
performed at alkaline pH (0.2 M NaOH), and the prod- 
ucts were quite stable. The conditions, however, were too 
harsh to avoid destruction of the polypeptide. It was sub- 
sequently noted that the addition of borate during the 
reaction of a protein with 2,3-butanedione accelerated 
the rate of the reaction at neutral pH and rendered the 
modification irreversible as long as the borate was pres- 
ent.” The initial product of the reaction of 1,2-cyclo- 
hexanedione and arginine, presumably diol 10-5, could 
also be stabilized significantly by the addition of borate.’ 
Borate is known to add to vicinal diols, such as sugars, to 
form cyclic borate diesters: 


S oH 
o OH 
10-8 


The addition of borate to cause the reaction with diones 
to be irreversible under mild conditions has permitted 
the isolation of modified peptides from proteins modi- 
fied by 1,2-cyclohexanedione. 

Glutamates and aspartates are modified with 
carbodiimides (Figure 10-5).°° The carbodiimides used 
can be either hydrophobic, such as dicyclohexyl- 
carbodiimide (DCCD; R, = R, = cyclohexyl), or 
hydrophilic and water-soluble, such as N-ethyl- 
N’-[3-(dimethylamino)propyl]carbodiimide [EDC; R, = 
C;H;, Ry = (CH3)2,NHC3Hg]. The initial product of the 
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Figure 10-5: Outcomes of the reactions of carbodiimides with aspartate or glutamate. The O-acylurea is formed by the direct addition of the 
carboxylate anion to the protonated carbodiimide. In a rigid, isolated, aprotic environment, the O-acylurea might be the final product, but 
there are two other possible outcomes. If a nucleophile (usually an amine) has been added, or if there is an adjacent nucleophile in the pro- 
tein (usually a lysine), the O-acylurea is an activated carboxylic acid derivative capable of acylating that nucleophile in an acyl exchange reac- 
tion O to give the N,N’-dialkylurea as the leaving group and the acyl derivative (usually the amide) of the glutamate or aspartate with either 
the added nucleophile or the lysine in the protein. If there is no accessible nucleophile, the O-acylurea can rearrange, by intramolecular acyl 
exchange @), to the N-acylurea. Pathway ® is initiated by the attack of the extraneous nucleophile on the activated carboxyl group, and path- 
way @) is initiated by intramolecular attack of the unprotonated urea nitrogen on the acyl carbon. 


reaction is an O-acylurea™ in which the acyl carbon of 
the original glutamate or aspartate has been activated by 
forming an acyl derivative, the leaving group of which 
is an excellent one because it is the oxygen of an 
N,N’-dialkylurea (pK, = 1). 

Four fates await this O-acylurea. First, if it is 
buried in a nonnucleophilic environment within the 
protein and sterically constrained, it will remain as the 
O-acylurea until the protein is unfolded, at which point 
it will usually hydrolyze back to the unmodified gluta- 
mate or aspartate. Second, if it is somewhat buried in a 
polar environment, but not sterically constrained, the 
O-acylurea will rearrange to the N-acylurea, which is 
stable (pathway @ in Figure 10-5). Dicyclohexyl- 
carbodiimide is often incorporated into a protein in this 
way. It usually reacts with buried carboxylic acids 
because it is so hydrophobic, and the buried O-acylurea 
usually survives long enough to rearrange as the protein 
sits around after the investigator believes the reaction 
has finished; but the reaction rarely proceeds in high 
yield. Third, if there is a nucleophilic amino acid in the 


protein, for example a lysine, in the vicinity of the 
O-acylurea, an intramolecular adduct, for example an 
amide, between that amino acid and the glutamate or 
aspartate will form (pathway @ in Figure 10-5). 
Fourth, if an external amine, such as the methyl ester of 
glycine, has been added in high concentration to the 
solution, it can react with the O-acylurea as it is formed, 
if it is sterically accessible, and produce the amide 
between the external amine and the glutamate or aspar- 
tate.” In this way, a defined covalent modification of 
the carboxylate can be made. If the external nucleophile 
is ammonia, glutamates and aspartates are converted to 
glutamines and asparagines, respectively.” 

The practical outcome of each of these four fates is 
unique. In the first, the native protein is modified by the 
carbodiimide at glutamate or aspartate but loses the 
modification upon unfolding. In the second, a stable 
derivative between the protein and the carbodiimide is 
formed. In the third, the glutamate or aspartate is stably 
modified by being intramolecularly cross-linked,” but 
neither the carbodiimide nor an external amine is incor- 
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ethyl carbonic anhydride 


Figure 10-6: Formation of an ethyl carbonic anhydride (Figure 10-4) at a glutamate or aspartate by N-(ethoxycarbonyl) -2-ethoxy-1,2-dihy- 
droquinoline. The 2-ethoxy group leaves as the alcohol when the reagent reverts in water to the aromatic N-(ethoxycarbonyl)iminium cation. 
The iminium cation reacts with the carboxylate of a glutamate or an aspartate. The adduct that is formed undergoes an intramolecular, cyclic 
rearrangement involving an acyl exchange at one end with quinoline as the leaving group and the expulsion of an ester from its adduct with 


an iminium cation at the other end. 


porated into the protein. In the fourth, an external amine 
but not the carbodiimide is incorporated. Often, regard- 
less of the intentions of the investigator, a combination 
of all of these outcomes occurs, and the complex mixture 
of products that results defies any attempt at quantifica- 
tion. 

Another reagent used to activate the carboxylates of 
glutamates and aspartates is N-(ethoxycarbonyl)- 
2-ethoxy-1,2-dihydroquinoline (EEDQ). It activates the 
carboxylate (Figure 10-6) by forming a mixed ethyl car- 
bonic anhydride (pK, = 7), which is an acylating agent 
(Figure 10-4) that is somewhat less reactive than an 
O-acylurea (pK,;¢ = 1) but capable of the same types of 
intramolecular or intermolecular reactions with nucle- 
ophiles such as a lysine on the same protein or another 
protein’! or an amine that has been added to the solution. 

N-Ethyl-5-phenylisoxazolium-3’-sulfonate (Wood- 
ward’s reagent K)” activates glutamates and aspartates’® 
by forming an enol ester: 
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In this intermediate enol ester, the acyl carbon of the 
aspartyl or glutamyl side chain is activated sufficiently to 
react readily with nucleophiles elsewhere on the protein 
or with nucleophiles such as amines that have been pur- 
posely added to the solution, as does the O-acylurea that 
is the intermediate in the reactions of carbodiimides 
(Figure 10-5). The acyl group that has been activated 
with N-ethyl-5-phenylisoxazolium-3’-sulfonate is also 
reactive enough to be reduced with borohydride ion, 
(BH, ) to convert the side chain of the aspartate or gluta- 
mate to the respective alcohol,” just as an ester can be 
reduced to the corresponding alcohol by AIH, ion. 

Compounds that serve as precursors to nitrenes or 
carbenes through photolytic reactions are reagents that 
display even less specificity than alkylating agents in the 
modification of the amino acids in a protein. The fact 
that they are generated photolytically permits an added 
level of control over the reaction. The reagent can be 
equilibrated with the protein and then activated. 

Aryl azides, such as phenyl azides or nitrophenyl 
azides, are the usual precursors for nitrenes. A nitroaryl 
azide produces a nitroaryl nitrene upon photolysis: 
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A convenient, widely used reagent for attaching a nitro- 
phenyl azide to other compounds (the R in Reaction 
10-41) by nucleophilic aromatic substitution is 4-azido- 


542 Chemical Probes of Structure 


2-nitrofluorobenzene. Aliphatic azides have also been 
observed to insert photolytically into proteins.” 

A nitrene is a nitrogen the four valence orbitals of 
which are occupied by only six valence electrons. 
Therefore, it is dramatically electron-deficient and elec- 
trophilic. In a singlet nitrene, three of these orbitals are 
occupied by pairs of electrons and one orbital is vacant. 
In theory, a singlet nitrene, because of its vacant orbital, 
has a higher preference for insertion into nitrogen-hydro- 
gen or oxygen-hydrogen bonds than carbon-hydrogen 
bonds because atoms of oxygen or nitrogen attached to 
carbon are electron-rich. In a triplet nitrene, two of the 
orbitals on nitrogen are each occupied by only one 
unpaired electron and the other two are occupied by two 
pairs of electrons. Consequently, a triplet nitrene is a 
diradical. Theoretically, triplet nitrenes should be able to 
modify proteins by hydrogen abstraction followed by 
rebound of the two adjacent monoradicals.” Because 
hydrogen is usually more easily abstracted from carbon 
than from oxygen or nitrogen, triplet nitrenes should 
abstract hydrogen more readily from carbon-hydrogen 
bonds than either nitrogen-hydrogen or oxygen-hydro- 
gen bonds. 

When light is absorbed by an aryl azide, which itself 
is a singlet, the excited state is initially a singlet excited 
state that must produce a singlet nitrene because N; is a 
singlet molecule. If the singlet excited state lasts long 
enough, it can convert to a triplet excited state by inter- 
system crossing. The triplet excited state produces a 
triplet nitrene and singlet N. The yield of triplet excited 
state can be increased by adding a triplet sensitizer.” 
Singlet nitrene itself can turn into triplet nitrene if it sur- 
vives long enough. In the absence of a sensitizer, only 
about 10% of the nitrene produced by photolysis of 
phenyl azide is triplet.” 

Although it is widely believed that aryl nitrenes, such 
as the phenyl nitrenes or the 3-nitro-4-(alkylamino) phenyl 
nitrenes usually employed in the modification of proteins, 
should insert into carbon-hydrogen bonds, a reaction that 
would require significant yields of the triplet state, the 
chemistry of such nitrenes belies this belief. In ideal situ- 
ations, such as the intramolecular insertion in the vapor 
phase of an aryl nitrene into a tertiary carbon-hydrogen 
bond four carbons away, a reasonable yield of the N-al- 
kylaniline (50%)” is obtained. When, however, phenyl 
nitrene is generated in cyclohexane by photolysis, no 
insertion (<30%)”® into the solvent is observed, and most 
of the reaction proceeds with either dimerization of the 
nitrene itself or the production of aniline by two succes- 
sive hydrogen abstractions by the triplet. Phenyl nitrene 
generated by photolysis under the same conditions in 
hydroxylic solvents such as methanol or propanol inserts 
into those solvents in high yield (80%).”® The products of 
the photolytic reactions of 4-substituted phenyl nitrenes 
with water, methyl alcohol, or diethylamine as solvent are 
the lactams 10-9 (60-90% yield), the 2-methoxy- 
3-hydroazepines 10-10 (40-80% yield), or the 2-di- 


ethylamino-3-hydroazepines 10-11 (90-100% yield), 
respectively:*” 
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The products of the photolytic reaction of 4-methyl- 
amino-3-nitrophenyl nitrene with methanol as solvent 
or 1% diethylamine in methanol are aniline 10-12 (40% 
yield) and aniline 10-13 (40% yield) or aniline 10-12 
(30% yield) and aniline 10-14 (70% yield), respec- 
tively:” 


CH; NO2 CH3 NO» 
N NH, „N 
OCH; NH3 
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These products can be explained as the results of 
nucleophilic addition of solvent or solute to the two 
electrophilic species engaged in the following equilib- 


rium: 
Rn 


Consequently, the majority of the products from the 
reaction of an aryl nitrene with a protein should result 
from reaction with nucleophilic functional groups. 

The identity of the amino acids modified by aryl 
nitrenes are consistent with these general considera- 
tions. The modification ofrabbit glyceraldehyde-3-phos- 
phate dehydrogenase (phosphorylating) by a 
p-amino-m-nitrophenyl nitrene produces the sulfen- 
amide of Cysteine 149 (10-15)*" 


(10-42) 
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which, however, could have arisen from either the triplet 
or the singlet. The photolytic modification of rat 
phosphoenolpyruvate carboxykinase (GTP) with 
8-azidoguanosine 5’-triphosphate produce an 
intramolecular disulfide between two cysteines in the 
protein,” presumably arising from the nucleophilic dis- 
placement of the 8-aminoguanosine from the initially 
formed sulfenamide (10-16) by an adjacent cysteine. In 
addition to cysteine, tyrosine and lysine have usually 
been identified as the reactants with aryl nitrenes, but a 
leucine, two alanines, and a phenylalanine have also 
been reported to be modifed "DÉI Singlet aryl nitrenes 
can insert intramolecularly by electrophilic aromatic 
substitution into phenyl rings,” and this reaction would 
explain the modification of phenylalanine. 

Carbenes, like nitrenes, have only six valence elec- 
trons on one atom, but they are distributed around a 
carbon instead of nitrogen. The carbenes generally used 
for the modification of proteins are on secondary car- 
bons. They can be singlets or triplets, and relatively 
stable examples of both of these electronic states for car- 
benes have been synthesized.®®® Again, the singlet is 
the first product and has a significant preference for 
insertion into nucleophilic locations such as nitrogen- 
hydrogen or oxygen-hydrogen bonds” and, in an aque- 
ous solution of protein, probably reacts before much 
triplet is formed. The carbene of adamantane, formed by 
photolysis of adamantyldiazirine, is entirely (>99.9%) 
singlet and reacts as a strong electrophile to form ylides 
with the nitrogen of pyridine (Equation 10-43) and the 


sulfur of thiophene:*® 
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(10-43) 
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The singlet adamantyl carbene can also be protonated 
on its lone pair of electrons to produce the carbocation 
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that reacts readily with nucleophilic heteroatoms in 
reactions analogous to Sy1 substitutions. 

Most aliphatic carbenes are prone to intramolecu- 
lar rearrangements that are more rapid than their inter- 
molecular reaction with other molecules in solution.” 
Reagents must be designed to avoid such rearrangement. 
The compounds that have proven to be the most efficient 
and uncomplicated precursors of carbenes unsus- 
ceptible to rearrangement are derivatives of 1-trifluo- 
romethyl-1-phenyldiazarines:*” 
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The carbene is generated by photolysis. Prior to the 
advent of these reagents, œ-diazoketones, a-diazoacetyl 
esters, and ethyldiazomalonyl esters were used as pre- 
cursors of carbenes:”° 
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The first application of a carbene as a reagent for 
the modification of a protein used a diazoacetyl ester of 
Serine 195 in œ chymotrypsin to increase the chances of 
insertion into other amino acids in the protein. 
Modifications of a cystine,” a serine other than Serine 
Io H an alanine,” and a tyrosine” were observed. 
Carbenes have usually been found to display a prefer- 
ence for insertion into oxygen-hydrogen and nitro- 
gen-hydrogen bonds in proteins. They have been 
observed to modify lysines,” tyrosines,” tryptophans,” 
glutamic acids,” and aspartic acids,” but incorporation 
has also been noted into valine” and glycine.” 

Two remarkable illustrations of the preference of 
carbenes for nitrogen-hydrogen and oxygen-hydrogen 
bonds can be found in the use of these reagents to 
modify amino acids found in polypeptides that in the cell 
span membranes of phospholipid. In both instances, 
several precursors for the carbene were used that should 
have placed it at different respective locations along the 
polypeptide crossing the phospholipid bilayer; yet in 
each of the two experiments, the several different car- 
benes reacted respectively with the same amino acid, in 
one case a tryptophan” and in the other case a glutamic 
acid.” Other amino acids, with readily abstracted hydro- 
gens at tertiary carbons, must have been more accessible 
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to at least some of these carbenes than the respective 
tryptophan or glutamic acid into which they eventually 
inserted. These results demonstrate that carbenes, like 
nitrenes, are not so promiscuous in their choices of reac- 
tants as they are often thought to be. 

The products of the reactions between nitrenes and 
carbenes and proteins have rarely been characterized. In 
part, this is due to the low yields encountered in most of 
these reactions, presumably because of the tendency of 
the singlet carbene or singlet nitrene and its rearranged 
products to insert into water,” the inescapable solvent. 

There are other compounds that can be photoacti- 
vated to form reactive species that modify proteins. For 
example, a 5-bromouracil was inserted in place of a 
thymine within the DNA sequence recognized by general 
control protein GCN4. When the complex between the 
protein and the modified DNA was irradiated with ultra- 
violet light (Amax = 254 nm), Alanine 238 of the protein 
had been covalently modified in low yield.” Vanadate 
anion binds specifically to proteins such as rabbit 
myosin” and isocitrate lyase from Escherichia coli.” 
Upon photolysis of these complexes, serines in the sites 
at which the vanadate had bound were photooxidized to 
products that could be converted by reduction with 
Naf’HIBH, to [Hlserine.”” The incorporated tritium 
tagged the specific serines modified. 

Oxidative cleavage is used to modify the 
polyamide backbone of a protein. In the presence of 
reducing agents such as ascorbate or dithiothreitol, com- 
plexes between Fe” or Cu’ and chelators such as tetracy- 
cline, N,N,N’,N’-tetracarboxymethyl-1,2-diaminoethane 
(EDTA), phenanthroline, or the protein itself convert O, 
or H20; into reactive species capable of cleaving peptide 
bonds.”*'”! The products of this cleavage that have been 
identified so far suggest that several different reactions 
can bring it about, so no unique mechanism seems to 
predominate.'’” When the chelated metal is attached in 
some way to the protein, the cleavages that occur are 
confined to a few peptide bonds rather than being widely 
distributed along the polypeptide,’”'™ an observation 
suggesting that the reactive species responsible for the 
cleavage either remain bound to the metallic cation or 
cannot diffuse very far without being discharged. 

The chelating ligand surrounding the metallic 
cation is usually attached purposefully to a defined loca- 
tion on the surface of the protein to be modified. The 
activated products of the reaction cleave a peptide bond 
immediately adjacent to the place where the cation ends 
up. For example, the polypeptide in the vicinity of the site 
with which tetracycline associates on the tetracycline 
repressor was identified by the oxidative cleavage of the 
protein produced by the complex between tetracycline 
and Pei" JH In this case, the major targets for cleavage 
were the peptide bonds between positions 103 and 104 
and positions 104 and 105. Lower yields of cleavage were 
observed between positions 55 and 56 and positions 135 
and 136 and even lower yields between positions 143 and 


144 and positions 146 and 147. If the protein itself has a 
site at which a structural metallic cation is bound, oxida- 
tive cleavage induced by the appropriate cation can map 
the polypeptide surrounding that site.”* One advantage 
of oxidative cleavage of the polypeptide is that it can be 
readily detected by submitting complexes between 
dodecyl sulfate and the protein and its fragments to elec- 
trophoresis. Mass spectrometry of the resulting frag- 
ments can be used to locate the exact points of 
cleavage.” 

Site-directed mutation produces the covalent 
modification of a protein by converting one particular 
amino acid in its sequence into another of the 20 amino 
acids. It is also possible to delete amino acids from the 
sequence of a polypeptide or insert extra amino acids at 
a particular location with similar techniques. A common 
goal of both chemical modification and site-directed 
mutation is to correlate the structure of the protein with 
its function. An assessment of the effects of either type of 
modification on the normal function of the protein pro- 
vides information about the role of the modified amino 
acid in that function. In order for either approach to 
place that side chain in a structural context, a crystallo- 
graphic molecular model of the protein is required to 
define the location of the modified amino acid. 

A strategic distinction, however, exists between 
covalent modification with chemical reagents and cova- 
lent modification by site-directed mutation. In the 
former procedure, the protein of interest is modified, and 
the effects of the modification on the function of the pro- 
tein are assessed before the outcome of the reaction is 
defined by digesting the protein and then identifying the 
modified peptides either by mass spectrometry or by 
chromatographic separation and sequencing. In the 
latter procedure, any amino acid in the sequence of the 
protein can be chosen, and this particular modification is 
then performed before the results are assessed by the 
effect of the modification on some property of the pro- 
tein. In the former procedure, the selectivity of the chem- 
ical modification is determined by the accessibility and 
the inherent nucleophilicity of the side chains that are 
potential targets, properties controlled by the protein 
and not by the investigator, so the results often contain 
unexpected information about the relationship between 
structure and function. In the latter, modification can be 
performed at any site selected by the investigator with 
absolute specificity, but the choice of which amino acid 
to modify and which of the 19 mutations to perform at 
that site relies on his intuition. Because the intuition of 
the investigator is usually fallible, informative experi- 
ments using site-directed mutation usually require that a 
crystallographic molecular model be already available to 
assist in the choice of the site to be modified. 

It is also possible to decide to focus one’s attention 
on the nucleophilicity of a particular side chain in the 
sequence of a protein on the basis of the location of that 
side chain in a crystallographic molecular model, its 
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position relative to specific features in the sequence, or 
its identification as an important structural or functional 
site from other experiments, just as one would choose a 
particular amino acid in the sequence of a protein for 
site-directed mutation. To assess its role in the function 
of the protein, the rate of the reaction of this one partic- 
ular nucleophilic side chain with one or more appropri- 
ately chosen electrophilic reagents can then be followed 
to probe its accessibility,” changes in accessibility 
under different circumstances,'® or its intrinsic nucle- 
ophilicity.'°° The rate or yield of the modification at this 
particular side chain can be monitored by repetitively 
purifying a short peptide containing this amino acid 
from digests of the protein modified under different con- 
ditions or for various intervals of time.'"” 

An effective way to perform the requisite purifica- 
tions of the products of these reactions rapidly is with an 
immunoadsorbent. Immunoglobulins raised against a 
synthetic peptidel with the same amino- or carboxy- 
terminal sequence as the peptide containing the amino 
acid being modified are used to produce an affinity 
adsorbent able to capture out of a digest of the protein 
only the particular peptide containing the modified 
target.'’*' Because the immunoadsorbent recognizes 
sequences that do not contain the target of the modifica- 
tion, both modified and unmodified versions of the pep- 
tide are isolated simultaneously, and the fraction of the 
side chain chosen for investigation that has been modi- 
fied in each sample is immediately apparent.” 

During the covalent modification of a protein, even 
with as nonspecific a reagent as iodoacetamide,’ the 
various types of amino acids do not react as homoge- 
neous populations. This is most readily discerned when 
the amount of incorporation into a particular type of 
amino acid is plotted as a function of the duration of the 
reaction.” If the concentration of reagent remains con- 
stant, the natural logarithm of the amount of unmodified 
amino acid should decrease as a linear function of time 
because the reaction is pseudo-first-order (Equation 
10-12). This is usually not observed, and the disappear- 
ance of a particular type of amino acid, such as histidine, 
lysine, or cysteine, as the modification proceeds usually 
displays inhomogeneous kinetics (Figure 10-7).'° This 
can be ascribed to the fact that each histidine, each 
lysine, or each cysteine in the sequence of the protein is 
in a different environment in the folded polypeptide and 
reacts at a unique rate with the reagent. For example, the 
three histidines of a@lactalbumin can be modified by 
iodoacetamide, but each reacts at a significantly different 
rate" so that the disappearance of total histidine dis- 
plays inhomogeneous kinetics. 

It has already been noted that the environment sur- 
rounding a particular amino acid in a protein shifts its 
apparent acid dissociation constant from the value it 
would have in an unfolded polypeptide (Table 2-2). Such 
shifts of the acid dissociation constants from their intrin- 
sic values move each of the inflections of the profiles of 
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Figure 10-7: Reaction of iodoacetamide with the lysines of glu- 
cose-6-phosphate isomerase. Glucose-6-phosphate isomerase 
(0.12 mM) was mixed with iodoacetamide (49 mM) at pH 8.5 and 
40 °C. At the noted times a sample was removed from the solution, 
the reaction was quenched with 0.3 M 2-mercaptoethanol, and the 
sample was subjected to total amino acid analysis. The amount of 
unalkylated lysine (percentage of total) is plotted on a logarithmic 
scale as a function of time. Reprinted with permission from ref 15. 
Copyright 1970 Journal of Biological Chemistry. 


log k; against pH (Figure 10-1) horizontally to coincide 
with the altered value of the respective pK, and simulta- 
neously move the plateau at high pH vertically in 
response to the Bronsted relationship (Equation 10-14). 
A unique shift occurs for each amino acid in the protein. 
An example of such an effect of environment on the 
nucleophilicity of acid-bases is provided by a pair of cys- 
teine side chains, Cysteine 3 and Cysteine 32, in seminal 
ribonuclease.''® These two adjacent cysteines react more 
rapidly with 5,5’-dithiobis(2-nitrobenzoate) at pH 7 than 
do model compounds such as cysteine itself or cysteinyl- 
cysteine. The synthetic peptide MCCRKM, which incor- 
porates the sequence of seminal ribonuclease around 
these two cysteines, however, has the same enhanced 
reactivity. It was shown that this enhanced reactivity is 
due to the fact that the cysteines are adjacent to an argi- 
nine and a lysine. These cationic amino acids lower the 
values of the acid dissociation constants for the two cys- 
teines. Although the nucleophilicities of the thiolate 
anions also decrease accordingly, the Bronsted coeffi- 
cient B is small (<0.2). Therefore, the increase in reactiv- 
ity at pH 7 results from a significant increase in the 
concentration of the respective thiolate anions, which 
each have almost the same intrinsic nucleophilicity, Keys» 
as a cysteine the pK, of which has not been lowered. 
Although experiments involving covalent modifica- 
tion are usually designed to answer a specific question 
about a particular protein, there are several common 
themes. Covalent modification is often performed to 
identify particular amino acids within a binding site for a 
particular ligand such as a substrate for an enzyme, or 
the agonist or antagonist of a receptor, or for a segment 
of DNA. An amino acid involved in the function of a pro- 
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tein can be identified when its covalent modification 
inhibits that function. The acid dissociation constant for 
a particular amino acid in a protein can be determined 
from the rate of its modification as a function of the pH. 
The accessibility of particular amino acids to the solvent 
can be inferred from the rates of their covalent modifica- 
tion. The amino acids incorporated into a transient het- 
erologous interface can be identified by changes in the 
rates of their modification upon formation of the com- 
plex between the two proteins. The close proximity of 
two amino acids in a protein can be revealed by their 
covalent cross-linking. 

The most common use of covalent modification is 
to identify amino acids in a site on a protein for binding 
a specific ligand. One way this can be done is to measure 
the change in the accessibility of a particular amino acid 
upon the binding of the ligand to the protein. For exam- 
ple, the greater than 4-fold decrease in the rate constant 
for the reaction between acetic anhydride and Lysine 
501 in ovine Na*/K*-exchanging ATPase when MgATP is 
bound to the enzyme suggested that this lysine partici- 
pates in the specific interactions between the protein 
and MgATP when it is bound." This suggestion was 
later verified by a crystallographic molecular model of 
the complex between ATP and a closely related 
enzyme.’ ® Lysines involved in the interface forming the 
complex between DNA topoisomerase of vaccinia virus 
and a double helix of DNA were identified by noting sig- 
nificant decreases in the yield of their modification by 
citraconic anhydride upon formation of the complex.’ 
In this latter experiment, advantage was taken of the 
reversibility of the acylation of lysines by citraconic 
anhydride. The products of the modification were 
unfolded and modified at the unacylated lysines with 
N-hydroxysuccinimide acetate (Figure 10-4), the citra- 
conyl groups were removed, and the locations of the pre- 
viously citraconylated lysines were identified by 
digesting the protein at the deprotected lysines with lysyl 
endopeptidase and examining the pattern of fragments 
produced. 

Another way that an amino acid in a binding site on 
a protein is identified is to incorporate a reactive func- 
tional group into the ligand itself. For example, it was 
demonstrated that the amino-terminal threonine pro- 
duced upon the normal posttranslational cleavage of 
y-glutamyltransferase from E. coli between Glutamine 
390 and Threonine 391 is in the active site of the enzyme 
because the threonine was covalently modified by 
2-amino-4-(fluorophosphono)butanoic acid, an elec- 
trophilic mimic of the y-glutamyl group in glutathione, a 
substrate of the enzyme.'”” 

Covalent modification often leads to the inactiva- 
tion of a protein and identifies candidates for function- 
ally important amino acids. The modification of Lysine 
85, Histidine 88, and Histidine 161 in bovine 
cytochrome b;;| by diethyl pyrocarbonate leads to the 
inactivation of the fast electron transfer performed by 


this protein,'*' a fact suggesting that these two histidines 
and the lysine participate in this function. 

The pK, for a particular amino acid is often esti- 
mated from the rate of its reaction with an electrophile as 
a function of pH. The reaction of acetic anhydride with 
particular lysines in a protein has been used to monitor 
their individual acid dissociation constants and nucle- 
ophilicities..° Because acetic anhydride is rapidly 
hydrolyzed, the yield of its incorporation at a set pH into 
a particular lysine in the protein, relative to the yield of its 
incorporation into an added standard amine, after the 
reaction has reached completion provides a direct meas- 
urement of the relative bimolecular rate constant for its 
reaction with that particular lysine at that pH. The situa- 
tion is formally equivalent to Equation 10-21 with k, 
being the rate constant for reaction of the acetic anhy- 
dride with the standard amine. If the absolute rate con- 
stant for the reaction between acetic anhydride and the 
standard amine is known, the absolute bimolecular rate 
constant for the reaction between the lysine of interest 
and acetic anhydride at that pH can be calculated from 
the relative rate constant, bd"! in Equation 10-23. The 
behavior of this absolute rate constant as a function of pH 
(Figure 10-1) provides an estimate of the pK, of the lysine. 

It is also possible to measure a pK, directly. For 
example, the pK, of Cysteine 25 in papain!” was deter- 
mined to be 8.5 at 25 °C and an ionic strength of 0.5 MI 
by following the rate of its reaction with chloroacetamide 
as a function of pH, the pK, of Cysteine 115 in UDP- 
N-acetylglucosamine 1-carboxyvinyltransferase from 
Enterobacter cloacae was determined to be 8.3 by follow- 
ing the rate of its reaction with iodoacetamide as a 
function of pH," and the pK, of Lysine 166 in ribulose- 
bisphosphate carboxylase from Rhodospirillum rubrum 
was determined to be 7.9 by following the rate of its reac- 
tion with 2,4,6-trinitrobenzenesulfonate.°° 

The reactivity of particular amino acids in the 
sequence of a protein can provide an indication of their 
accessibility to the aqueous phase. For example, a com- 
parison between the observed rate constant for the reac- 
tion of a particular lysine in a protein with acetic 
anhydride and the rate constant calculated from its 
observed pK, provides an estimate of the accessibility of 
that lysine to the solvent. Knowledge of the Bronsted 
coefficient £ (0.48) and the absolute bimolecular rate 
constant for the free base of an unhindered lysine 
(2700 M! e) of normal pK, (10.8) with acetic anhydride 
at 10 °C!’ permits the bimolecular rate constant expected 
for the modification of the free base of a fully accessible 
lysine with a particular pK, to be calculated. For example, 
the pK, of Lysine 501 in ovine Na*/K*-exchanging ATPase 
was found to be 10.4, so the rate constant of the reaction 
of its free base with acetic anhydride at 10 °C should have 
been 1700 Mel DR The fact that the rate constant was 
only 400 M™ s™ indicated that Lysine 501 was not fully 
exposed on the surface of the protein. In most instances, 
the apparent accessibility of a particular amino acid is 


determined by steric effects, engendered by neighboring 
amino acids in the folded polypeptide or a decrease or 
increase in its nucleophilicity brought about by its 
participation in intramolecular hydrogen bonds. 

Tetranitromethane, which is large and quite polar 
(10-3), reacts with the un-ionized, neutral form of tyro- 
sine. At neutral pH, all of the tyrosines in a protein should 
be un-ionized, and their modification by tetrani- 
tromethane should reflect only their accessibility.’ 
There are eight tyrosines in human carbonate dehy- 
dratase B, and only three of them, Tyrosine 20, Tyrosine 
88, and Tyrosine 114, react with tetranitromethane.'? 
Subsequent to this assessment, the protein was studied 
crystallographically, and in the map of electron density 
only Tyrosine 20, Tyrosine 88, Tyrosine 114, and Tyrosine 
129 were found to be “located on the surface of the mol- 
ecule.”!“* Aspartate 194 in bovine chymotrypsinogen A 
could not be modified in the native protein with ethyl 
glycinate and N-ethyl-N’-[3-(dimethylamino) propyl] car- 
bodiimide even under conditions where 13 of its 15 car- 
boxylates were modified completely.” Subsequently it 
was observed that Aspartate 194 is “buried” in the inte- 
rior of the crystallographic molecular model of chy- 
motrypsinogen.'“° When fructose-bisphosphate aldolase 
was modified with methyl acetimidate at high concen- 
trations, only 20 of its 30 lysines were modified.'*’ The 10 
unmodified lysines reacted readily when the protein was 
unfolded, and it could be shown that these were 10 
unique lysines in the sequence of the protein, presum- 
ably made unreactive by their surroundings in the folded 
polypeptide. 

Site-directed mutation can also be used to monitor 
the accessibility of particular locations in the amino acid 
sequence of the protein. An o helix passes across the sur- 
face of the crystallographic molecular model of A repres- 
sor.” In this o helix Isoleucine 84 and Methionine 87 are 
on the face of the a helix directed toward the interior of 
the protein while Tyrosine 85, Glutamate 86, Tyrosine 88, 
and Glutamate 89 are on the surface of the œ helix that is 
accessible to the solution. After this observation had 
been made crystallographically, it was shown that only 
isoleucine at position 84 and either methionine or 
isoleucine at position 87, of all of the 20 amino acids, pro- 
duces a functional protein, but 10-14 of the 20 amino 
acids can be substituted at the other four positions and 
still produce a functional protein.” 

Another covalent modification that has been used 
to assess the accessibility of particular amino acids in a 
folded polypeptide is endopeptidolytic cleavage. For an 
endopeptidase to cleave a peptide bond, the polypeptide 
at that location must be able to enter its active site. This 
has usually been assumed to require that the susceptible 
peptide bond be located on a somewhat flexible loop, on 
the outside surface of the protein, well exposed to the 
solvent. In the case of chymotrypsinogen A, the 
endopeptidolytic cleavages of the folded polypeptide 
that remove the amino acids between Leucine 13 and 
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Isoleucine 16 and between Tyrosine 146 and Alanine 149 
to produce achymotrypsin occur within two such 
loops.'”° In the case of deoxyribonuclease I, however, a 
less easily explained endopeptidolytic cleavage of the 
folded polypeptide has been observed. Under the proper 
set of conditions, chymotrypsin cleaves deoxyribonucle- 
ase I completely and exclusively at the peptide bond on 
the carboxy-terminal side of Tryptophan 178." In the 
refined crystallographic molecular model of deoxyri- 
bonuclease I, °" Tryptophan 178 is found in the middle of 
an «helix that is a rigid feature of the structure. This 
æ helix traverses the outer surface of the protein, but 
Tryptophan 178 is on the side of the whelix pointed 
toward the interior and itself is inaccessible. There are, 
however, no more accessible sites in the protein at which 
chymotrypsin could cleave, and it may be the case that in 
solution the ahelix containing Tryptophan 178 is in 
equilibrium with a disordered loop. 

Changes in the accessibility of amino acids on the 
surface of a protein brought about by its participation in 
an association with another protein can be monitored 
as changes in the yields of their covalent modification. 
For example, the yields of the reductive methylations of 
Lysines 50, 61, 68, 113, 284, and 291 with ['*C]formalde- 
hyde and NaCNBH; decreased 2-4-fold upon conversion 
of monomeric actin to helical filaments of actin 
Covalent modification can also introduce bulky groups 
that sterically inhibit an association between two pro- 
teins. For example, when Histidine 40 of actin is modified 
with diethyl pyrocarbonate'” or when Lysine 61 of actin 
is modified with fluorescein isothiocyanate, ™* the modi- 
fied actin is no longer able to form helical polymers. All 
of these observations were used as evidence in favor of 
the molecular model of the helical polymer of actin 
(Figure 9-1B),!*° in which all of these amino acids ended 
up in the interfaces between the monomers. If the model 
is correct, the formation of the interfaces in the 
homopolymer should sterically hinder the reductive 
methylation of the lysines, and the addition of bulky 
functional groups to these histidines or these lysines on 
the actin monomers (Equations 10-24 and 10-26) should 
sterically hinder their polymerization.” 
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was covalently attached at random to lysines on the sur- 
face of sigma factor rpoD isolated from DNA-directed 
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RNA polymerase of E. coli. When the normally occurring 
complex was formed between this modified protein and 
DNA-directed RNA polymerase, the tethered Fe** was 
able to catalyze cleavage at specific peptide bonds in the 
DNA-directed RNA polymerase when ascorbate and H,O, 
were added H? These sites of cleavage identified locations 
on the surface of the RNA polymerase within or adjacent 
to the interface between it and sigma factor rpoD. This 
interface has also been probed by footprinting. 

Footprinting is the identification of those peptide 
bonds on the surface of a protein that are protected from 
random nonspecific cleavage when that protein forms a 
complex with another protein. The peptide bonds pro- 
tected are assumed to be within the footprint of the other 
protein upon the surface of the protein being examined. 
That footprint is the portion of the surface of the protein 
being examined that falls within the heterologous inter- 
face. DNA-Directed RNA polymerase from E coli 
stripped of its sigma factor rpoD is cleaved at 83 different 
peptide bonds on its surface when it is exposed to the 
Fe* chelate of N,N,N’,N’-tetracarboxymethyl-1,2-di- 
aminoethane in the presence of ascorbate and H,O». 
When sigma factor rpoD is reassociated with the DNA- 
directed RNA polymerase, the cleavage at seven of these 
sites is prevented.'*’ It was assumed that these seven 
peptide bonds are within the footprint of sigma factor 
rpoD upon DNA-directed RNA polymerase. 

Covalent cross-linking uses covalent modification 
to assess the proximity of particular amino acids. The 
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functional groups at the two ends of a bifunctional cross- 
linking reagent are usually electrophiles commonly used 
in monofunctional reagents for the modification of pro- 
teins. They can be identical to each other (8-1 and 8-2), 
or they can be two electrophiles with different specifici- 
ties (8-3). They can be connected by a chain of atoms 
stably bonded or a chain of atoms containing a bond that 
can be cleaved as desired (8-3) to permit later separation 
and identification of the cross-linked species. 
Intramolecular cross-linking can be used to deter- 
mine juxtapositions in a single folded polypeptide. The 
simplest example of such cross-links are naturally occur- 
ring cystines, which automatically provide evidence for 
the juxtaposition of two segments of polypeptide,” but 
there are unnatural, chemical methods for forming 
cross-links. The bifunctional reagent 2-(p-nitrophenyl)- 
3-(3-carboxy-4-nitrophenyl)thio-1-propene (10-18) can 
undergo a series of reversible addition-eliminations to 
form bridges between two nucleophilic amino acids 
(Figure 10-8), either lysines or cysteines.” The reaction 
is reversible as long as the nitrophenyl group is present to 
stabilize the carbanion but can be made irreversible by 
reducing the nitro group with dithionite. Therefore, the 
reagent can be permitted to step around the protein until 
the most stable cross-link is formed, and this cross-link 
can then be locked in by reduction. In this way, two pairs 
of intramolecular cross-links on bovine pancreatic 
ribonuclease, one between Lysine 7 and Lysine 37 and 
the other between Lysine 31 and Lysine 41, could be 
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Figure 10-8: Mechanism by which 2-(p-nitrophenyl)-3-(3-carboxy-4-nitrophenyl)thio-1-propene (10-18) cross-links adjacent lysines. The 
olefin on the p-nitrostyrene is activated by the electron-withdrawing capacity of the nitro group and participates in a sequence of reversible, 
nucleophilic addition-eliminations. Because the adduct is symmetric, two nucleophiles are cross-linked reversibly. In the first step of the 
reaction, the nitrothiobenzoate (Structure 10-2) is the preferred leaving group from the asymmetric carbanion, but when the carbanion is 
then formed between two lysines, either can be the leaving group and the reagent can be passed from lysine to lysine over the surface of the 
protein. The nitrobenzyl carbanion is not that basic, so its protonation is reversible and this allows the cycles of addition and elimination to 
proceed. When the reaction is quenched with acid and the nitro group is reduced to the amine, the aminobenzyl proton is no longer acidic 


and the reagent is fixed in place. 


formed in high yield when only 2 molar equivalents of 
the reagent was added initially to the protein. The £ car- 
bons of these two pairs of lysines are 1.3 and 1.1 nm 
apart, respectively, and each partner in a pair is on the 
same side of the crystallographic molecular model. 

The reagent bromopyruvate is bifunctional by virtue 
of its alkyl bromide, which is an alkylating agent, and its 
carbonyl, which can form an imine with a lysine reversibly 
that can be reduced to the permanent secondary amine 
with NaCNBR;. Bromopyruvate is able to form an imine 
with Lysine 144 of 2-dehydro-3-deoxy-6-phosphoglu- 
conate aldolase and then alkylate Glutamate Sp. TË) This 
observation established the proximity of these two amino 
acids in the folded polypeptide before the crystallographic 
molecular model became available.'”” 

The participants in heterologous interfaces have 
also been probed by cross-linking. During the contrac- 
tion of muscle, a complex must form between a subunit 
of myosin in its helical polymer and actin in its helical 
polymer (Figure 9-1B). This complex between actin and 
myosin from rabbit muscle was cross-linked with 
1-ethyl-3-[3-(dimethylamino) propyl]carbodiimide, a 
reagent that couples carboxylates to lysines on the sur- 
faces of proteins (Figure 10-5). The amino-terminal pep- 
tide produced by cleavage of actin by hydroxylamine 
between Asparagine 12 and Glycine 13 and the carboxy- 
terminal peptide produced by cleavage of actin by 
cyanogen bromide at Methionine 354 were both found to 
be cross-linked to myosin in the covalently cross-linked 
complex.’ It was also found that amino acids within the 
segment of actin between Histidine 40 and Lysine 113 
were covalently attached to myosin when the complex 
between it and actin was cross-linked by N-(ethoxycar- 
bonyl)-2-ethoxy-1,2-dihydroquinoline, a reagent that 
also can couple carboxylates to lysines (Figure 10-6). On 
the basis of these results, amino acids within these three 
segments of actin are thought to be within the heterolo- 
gous interface it forms with myosin.“ 

Thiols, either in the form of cysteines in the 
sequence of the protein or introduced as covalent modi- 
fications, can be sites for cross-linking. It is possible to 
insert cysteines at specific positions in the amino acid 
sequence of a protein by site-directed mutation and then 
attempt to form cystines between them." The actual 
formation of a cystine demonstrates that the two cys- 
teines participating in it were adjacent to each other in 
the tertiary or quaternary structure of the protein. 
Proteins can also be modified by 2-iminothiolane to con- 
vert lysines to thiols:'“° 
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These thiols are then oxidized to mixed disulfides to 
cross-link various lysines on the proteins. The 21 folded 
polypeptides within the 30S subunit of the ribosome 
have been cross-linked to each other in this way. The var- 
ious products of the intramolecular cross-linking were 
identified by two-dimensional gel electrophoresis. 
During the electrophoresis, the disulfides linking pairs of 
these polypeptides were reduced by disulfide inter- 
change to unlink them from each other between the first 
and the second dimension. 

With 2-iminothiolane, as well as with dimethyl 
suberimidate, 4”! dimethyl adipimidate, N,N’-1,4- 
phenylenedimaleimide,'’"'?! tetranitromethane,'” 
tartryldi (e-aminocaproyl azide),'” and dimethyl 
3,3’-dithiobis(propionimidate),'”' 26 pairs of covalently 
cross-linked polypeptides could be unambiguously 
identified among the products from the reactions of the 
30S subunit of the ribosome from E coli.'”° After these 
results were reported, a crystallographic molecular model 
of the 30S subunit became available ID" Five of the 
cross-linked pairs involved polypeptides S1 and S21, 
which were not identified in the maps of electron density 
of the 30S subunit.” Of the remaining 21 pairs, nine have 
significant portions of their folded structure touching 
each other so intimately that their cross-linking would be 
expected and four may be near enough to each other in 
the crystallographic molecular model to be intramolecu- 
larly cross-linked by a long linker (Equation 10-46), but 
three have only a few positions in their sequences close 
enough to be cross-linked and five are not even near each 
other in the crystallographic molecular model. These last 
five cross-linked products must have been the result of 
intermolecular cross-linking, and some of the others may 
be as well. 

It is also possible to covalently cross-link thiols that 
have been introduced into a protein with 2-imino- 
thiolane. For example, such inserted thiols can be 
cross-linked with 4,6-di(bromomethyl)-3,7-dimethyl- 
1,5-diazabicyclo[3.3.0]octadiene-2,7-dione (dibromobi- 
mane): 


O O 


N 
H3C X d J CH3 


BrH2C CH2Br 
10-19 


This reagent makes the final cross-link strongly fluores- 
cent so that peptides containing the cross-linked lysines, 
still joined together, can be purified to identify their posi- 
tions in the sequence of the protein.'* 

In most experiments involving covalent modifica- 
tion of a protein with an electrophilic reagent, the effect 
of the modification on the normal function of that pro- 
tein is first monitored. When a reagent that has an inter- 
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esting or desirable effect is discovered, the position or 
positions in the amino acid sequence of the protein at 
which the modification has occurred must be identified 
in order to correlate this functional effect with the struc- 
ture of the protein. This identification is usually made by 
digesting the modified protein either chemically or enzy- 
matically, isolating the peptides that have been modi- 
fied, and analyzing them by direct sequencing or by mass 
spectrometry. The precise position of the modification 
in the amino acid sequence is defined by the appearance 
of the modified amino acid itself in one of the cycles of 
sequencing, by the disappearance of the amino acid nor- 
mally found at that cycle, or from the masses of the frag- 
ments produced by bombardment of the vaporized 
peptide with helium in a tandem mass spectrometer 
(Figure 3-8). 

Two inactivated products from the alkylation of 
bovine pancreatic ribonuclease with iodoacetate could 
be separated from each other in their native state before 
they were digested. Each had incorporated one car- 
boxymethyl group. From one of the products, a tryptic 
peptide (Histidine 105 to Valine 124) containing 1-car- 
boxymethylhistidine at position 119 was isolated; from 
the other product, a tryptic peptide (Glutamine 11 to 
Lysine 31) containing 3-carboxymethylhistidine at posi- 
tion 12 was isolated." 

In the chromatogram of the tryptic digest of ferre- 
doxin-NADP* reductase inactivated with N-ethyl[2,3-'C)] 
maleimide (Equation 10-30), there was one major 
radioactive peak that had the amino acid sequence 
SVSLCVXR, comprising positions 110-117 inthe sequence 
of the protein. In the sequence of the unmodified protein, 
the amino acid at position X is a lysine. Because lysine was 
not observed in that cycle of Edman degradation, because 
trypsin failed to cleave between the lysine and the argi- 
nine as it usually does, and because amino acid analysis 
of the peptide following its hydrolysis in acid produced a 
peak at the position of N-succinyllysine on the chro- 
matogram (Figure 1-3), the modification could be 
assigned to Lysine 116.' 

When y-glutamyltransferase from E coli that had 
been covalently modified with 2-amino-4-(fluorophos- 
phono)butanoic acid was digested with lysyl endopepti- 
dase (Figure 3-2) and the resulting peptides were 
separated by reverse-phase adsorption chromatography, 
only one previously unobserved peak of absorbance 
appeared on the chromatogram. A mass spectrum of the 
peptide responsible for that peak identified it as the pep- 
tide TTHYSVDDK (positions 391-399 in the sequence of 
the protein) into which 1 mole of the phosphonylating 
agent had been incorporated. A mass spectroscopic 
determination of the sequence of that peptide (Figure 
3-8) identified Threonine 391 as the site of phosphonyla- 
von 

When bovine cytochrome bse; that had been modi- 
fied with diethyl pyrocarbonate was submitted to matrix- 
assisted-laser-desorption ionization in a time-of-flight 


mass spectrometer, it was observed that the modification 
had increased the mass of the protein by the equivalent 
of three ethylpyrocarbonyl groups (3 x 72 Da= 28,253 Da 
- 28,033 Da). When a tryptic digest of the protein was 
submitted to mass spectrometry, 33 new peptides 
appeared in the spectrum,* the masses of all but two of 
which could be explained as the result of modification 
only at Lysine 85, Histidine 88, and Histidine 161.'” 

The advantage of identifying a site of modification 
chemically is that several different methods of analysis 
can be applied to the isolated product; the advantages of 
mass spectrometry are its rapidity and its sensitivity. 
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Problem 10-1: Mercaptoethanol undergoes the follow- 
ing dissociation: 


HOCH,CH,SH = HOCH,CH,S +H* 
pK, =9.50 


Suppose the total amount of mercaptoethanol in solu- 
tion is equal to [SHlror, a quantity you know since you 
added that much. Suppose, also, that there is a chemical 
reaction that occurs between only the anion, 
HOCH,CH,S, and an electrophile, X, and the rate of this 
reaction is 


rate = k[HOCH,CH,S IG 


As the pH changes, [HOCH,CH,S] changes although 
[SH] tor is always the same. 


(A) Show that rate = k{f(H* HSH] oK] 
(B) Give an explicit equation for f{{H"]). 


* Of the 31 tryptic peptides, 24 were derivatives of the tryptic pep- 
tide from Threonine 83 to Arginine 111 containing Lysine 85 and 
Histidine 88, and seven were derivatives of the tryptic peptide from 
Tyrosine 157 to Lysine 191 containing Histidine 161. The various 
derivatives resulted from incomplete yields of modification, low 
yields of modification at other histidines in the peptides, and 
incomplete tryptic digestion. 


(C) Plot log [Kos] against pH, where Eu, = 
k{f([H*])}. Indicate on the plot where pH = pK,. 


(D) At what pH does the rate of the reaction become 
Zero? 


(E) By what factor does the rate decrease for each 
decrease in pH of 1.00 when pH < pK}? 


Problem 10-2: Give all of the products and a mechanism 
for each of the following reactions. 


O 
A NH2 
O 
A RT; + O=C= — 
O 
B R__ SH + een To ae 
O 
H3C 
C Runn An | 0 — 
NH3 * 
O 


O 
R ® | 
E. N INH + ren — 


O 
R ® = 
Lo" "us + nee 


Problem 10-3: Diisopropyl fluorophosphate inhibits 
serine endopeptidases by specific phosphorylation of the 
serine in the active site. Papain is an endopeptidase that 
does not have a serine in its active site. Nevertheless, it 
reacts with diisopropyl fluorophosphate with the result 
that 1 mol of phosphate is bound for every mole of 
enzyme but without loss of enzymatic activity.'° The 
reaction between papain and the reagent was carried out 
with radioactive diisopropyl [”P]fluorophosphate, the 
modified protein was digested with chymotrypsin, and 
the segment of enzyme containing the radioactive label 
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was isolated. It corresponded to amino acids 112-123 in 
the sequence of papain: -QVQPYNQGALLY-. What is the 
most likely site of alkylphosphorylation of papain? 


Problem 10-4: 


(A) Draw the structure of the peptide RDVLMKE in 
the ionization state in which it would exist at 
pH 1.4. Indicate all lone pairs. 


The peptide was modified with iodoacetamide at pH 1.4, 
40 °C, for 20 h. Digestion with carboxypeptidase yielded 
the full complement of glutamic acid from the resulting 
peptide. Edman degradation yielded the full comple- 
ment of arginine. It is possible to estimate the number 
of charges a peptide bears at a certain pH from its 
behavior on electrophoresis. This was done for the ini- 
tial peptide and the product from its reaction with 
iodoacetamide. 


charge 
pH original peptide alkylated product 
6.5 0 +1 
2.1 +3 +4 


(B) Write a stoichiometric mechanism for the reac- 
tion that occurred between iodoacetamide and 
one of the side chains on this peptide. 


(C) How would the product of the reaction with 
iodoacetamide move on chromatography by 
cation exchange relative to the unreacted pep- 
tide? 


Problem 10-5: Write the complete reaction that occurs 
between a cysteine in a protein and 5,5’-dithiobis 
(2-nitrobenzoate). Write out the resonance forms of the 
nitrothiobenzoate dianion. What properties of this dian- 
ion do the resonance forms explain? 


Problem 10-6: A peptide has the sequence SVEKCYEKP. 


(A) How many charges does the peptide bear at 
pH 1.9? At pH 5.6? 


The peptide was reacted with trimethyloxonium tetraflu- 
oroborate 

a Br 

Q~cH 

N 3 
HC" CHa 

in aqueous solution at pH 6.0. Three methyl groups were 
covalently attached to the peptide. When the methylated 


and unmethylated peptides were examined by elec- 
trophoresis, the following result was observed. 


or 


CO 


(B) 
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> FP 


igin === === 
methylated control 


pH 1.9 


methylated control 


pH 5.6 


What nucleophiles on the peptide have reacted 
with the trimethyloxonium cation? Write a mech- 
anism for this reaction. Why is the trimethyloxo- 
nium cation so reactive? 
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Chapter 11 


Immunochemical Probes of Structure 


Immunoglobulins are proteins found, among other 
locations, in the blood serum of birds and mammals. 
Immunoglobulins are also called antibodies. In an 
animal, the function of an immunoglobulin is to recog- 
nize a foreign macromolecule, the antigen, by binding to 
it tightly. An antigen is any foreign macromolecule that 
elicits, upon its introduction into an animal, the produc- 
tion of immunoglobulins capable of binding to it with 
high affinity. Within the animal, when an antigen has 
been recognized by being bound to the immunoglobulin, 
it is usually destroyed. An important point which should 
be kept in mind is that the primary biological purpose for 
a particular immunoglobulin is to distinguish one partic- 
ular foreign, undesirable macromolecule from the 
myriad of necessary macromolecules indigenous to the 
animal. Whenever an immunoglobulin makes a mistake 
by recognizing and binding not only to its antigen but 
also to one or more of the macromolecules normally 
present in the animal, these indigenous macromolecules 
are also destroyed in autoimmune processes detrimental 
to the animal. Therefore, the immune system has evolved 
to produce immunoglobulins that are highly specific in 
their recognition of molecular structure. Almost any 
macromolecule can serve as an antigen, but proteins are 
the most common antigens. 

Because no predictions can be made as to what for- 
eign antigens will have to be recognized and destroyed 
during the life of the animal, the immune system must be 
prepared to make immunoglobulins capable of binding 
with high specificity to any foreign molecule when it is 
presented to the animal in an antigenic form. An extreme 
example of the ability of the immune system to produce 
immunoglobulins able to bind any foreign molecule is 
the production of immunoglobulins that bind Ceo ful- 
lerene,! a form of elemental carbon that is not encoun- 
tered in any natural setting and that does not resemble 
any natural antigen. 

The serum from any mammal or bird contains a 
wide variety of immunoglobulins, each with its own dis- 
tinct amino acid sequence and each present in its own 
distinct concentration. They are the immunoglobulins 
that have been produced in response to all of the foreign 
antigens encountered by that particular individual 
during its peculiar lifetime. This mixture of immunoglob- 
ulins is present in the serum at a total concentration of 
10-20 mg mL". 

The immune system of an animal can be stimulated 


to produce new immunoglobulins recognizing a particu- 
lar protein of interest by injecting that protein into the 
animal. Because the repertoire of immunoglobulins that 
an animal is capable of producing contains none that 
would recognize any of its own proteins, or they would 
be destroyed, the protein injected has to be from another 
species and different enough from any related indige- 
nous protein to be recognized as foreign. Because the 
oligosaccharides found on the proteins of animals are so 
similar and because most species contain every possible 
sequence of these common oligosaccharides as a result 
of microheterogeneity, an immunoglobulin specific for 
an oligosaccharide on the protein from an animal will 
rarely be produced. There are no such problems, how- 
ever, in producing immunoglobulins in animals that rec- 
ognize bacterial oligosaccharides.” Because the immune 
system has evolved to recognize and destroy foreign 
organisms such as viruses and bacteria, the surfaces of 
which are large aggregates of many subunits or many dif- 
ferent proteins, small proteins sometimes have to be 
covalently cross-linked to make them antigenic.’ If the 
immunization is successful, immunoglobulins that bind 
tightly to the protein that was injected appear at high 
concentration in the serum of the animal within two 
months. 

The paradigm of the various types of immunoglob- 
ulins found in serum is immunoglobulin G (Figures 7-13 
and 11-1).”® An immunoglobulin G is composed of two 
identical heavy o polypeptides (n,, = 440-450) and two 
identical light ß polypeptides (n, = 210-220).° Each 
heavy apolypeptide is folded into four internally 
repeating, superposable domains designated Vu, Col, 
Cy2, and Cy3 in the order in which they occur in the 
sequence of the protein. Each light £ polypeptide is 
folded into two internally repeating, superposable 
domains, V; and C,. Each of these six different domains, 
each approximately 110 amino acids in length and each 
present in two copies in the intact immunoglobulin G, is 
superposable in its folded form on each of the other five 
(Figure 7-13). The Vy and V, domains associate with each 
other, the C; and Col domains associate with each other, 
and these associations produce an af heterodimer of 
one heavy &subunit and one light B subunit. Two of 
these of heterodimers are associated through their Co? 
and C43 domains to form the entire immunoglobulin. 

An immunoglobulin G can be cut into three pieces. 
When the intact native protein is treated with papain,”® 
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cleavage occurs to the amino-terminal side of the 
cystines connecting the two heavy o polypeptides within 
the open, structureless segments (Figure 11-1), and two 
identical Fab fragments and one Fc fragment are pro- 
duced. The designation Fab arises from the fact that this 
fragment contains the site that binds the antigen. The 
designation Fc originally referred to the fact that this 
fragment could be crystallized. It is now more informa- 
tive and consistent to consider this the constant frag- 
ment. It is because it is constant that it can crystallize. 
Each of these fragments is a well-behaved, independent, 
soluble, globular protein. Each contains four of the orig- 
inal 12 internally repeating, superposable domains. 

The advantage of using Fab fragments in experi- 
ments is that they are univalent. Each Fab fragment con- 
tains only one binding site for the antigen. An intact 
molecule of immunoglobulin G is necessarily bivalent 
because, as an (@ß), homodimer, it must have two iden- 
tical binding sites for antigen (Figure 11-1). The fact that 
two antigens can be bound by intact immunoglobulin G 
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complicates some experiments. A bivalent analogue of 
the Fab fragment can be produced by digesting intact 
immunoglobulin G with pepsin.” The pepsin cleaves to 
the carboxy-terminal side of the cystines between the 
two heavy a subunits and produces a fragment, (Fab’),, 
containing two Fab fragments joined by two or more 
cystines. The advantage of an (Fab’),fragment is that 
when it is reduced with dithiothreitol or 2-mercap- 
toethanol, it dissociates into two monovalent Fab’ frag- 
ments the Col domains of which are slightly longer than 
those of an Fab fragment. A permanently bivalent frag- 
ment that is the same size as an Fab fragment can be 
made by fusing the cDNAs encoding the V, and 
Vy domains of a particular immunoglobulin. If the two 
cDNAs are connected to each other by a segment of DNA 
encoding a segment of flexible polypeptide too short to 
permit the intramolecular association of the domains in 
the expressed protein, they associate intermolecularly to 
form an antiparallel (V,-Viq)2 homodimer with two iden- 
tical sites for binding antigen." 


Figure 11-1: Structure of a molecule of 
immunoglobulin G. (A) Skeletal drawing of the 
polypeptide backbone of the crystallographic molec- 
ular model of immunoglobulin G° that is presented in 
stereo in Figure 7-13. This drawing was produced 
with MolScript.'™ (B) Diagrammatic representation 
of the molecule based on the internally repeating 
domains observed in its amino acid sequences. 
(Adapted with permission from ref 5. Copyright 1969 
National Academy of Sciences.) The complete mole- 
cule is composed of 12 superposable domains, all 
homologous in amino acid sequence, each about 110 
amino acids in length. In the center of each domain is 
a cystine (S—S) formed between two structurally 
adjacent cysteines, 60 amino acids apart in the amino 
acid sequence. A light p polypeptide and a heavy 
a polypeptide are linked by a cystine at the carboxy 
terminus of the light ßBpolypeptide, and heavy 
a polypeptides are linked together by two or more 
cystines between the hinges that join the three arms. 
The 12 domains are referred to as variable domain, 
heavy o polypeptide (Vy); constant domain 1, heavy 
apolypeptide (C41); constant domain 2, heavy 
apolypeptide (Cy2); constant domain 3, heavy 
apolypeptide (C3); variable domain, light 
Bpolypeptide (Vi); and constant domain, light 
Bpolypeptide (C,). Oligosaccharides (CHO) are 
attached to the two Cy2 domains. The three arms are 
the two antigen-binding fragments (Fab) and the con- 
stant fragment (Fc). The binding sites for the antigens 
are at the tips of the Fab arms and are formed by the 
variable domains from heavy and light subunits, 
respectively. The light dotted lines indicate where 
papain cleaves the molecule to produce the two 
Fab fragments and the Fc fragment; the heavy dashed 
lines indicate where pepsin cleaves the molecule to 
produce the (Fab’), fragment. The Fab fragments are 
missing the cystines holding the fragments of the 
heavy asubunits together; the (Fab’), fragments 
include these cystines. The hinges at which the cleav- 
ages by endopeptidases occur are indicated by the 
arrows in panel A. 


site 


The flexible, unsupported segments of polypeptide 
that connect the Fab portions to the Fc portion of an 
intact immunoglobulin G (Figure 11-1) are usually about 
20 aa long but can be as long as 70 aa.” These are its 
hinges. It is the open structure of these hinges that per- 
mits the papain or the pepsin to cleave the 
immunoglobulin G into its fragments. Because these 
hinges are so long, the two Fab portions in an intact 
immunoglobulinG are constantly moving relative to 
each other and relative to the Fc portion. When these 
segments are shortened sufficiently by site-directed 
mutation, the immunoglobulin becomes rigid.” 

The major immunoglobulins in the serum of a 
mammal are immunoglobulinsG (10-20 mg mL), 
immunoglobulins M (1 mg mL”), and immunoglobulins A 
(1 mg ml. Each of these immunoglobulins contains 
light 8 subunits that are indistinguishable from one type 
to the next. It is the heavy o subunits, always present in 
equimolar ratio to the light B subunits, that distinguish 
one type of immunoglobulin“ from the other. The heavy 
a subunits of all of these immunoglobulins are homolo- 
gous to each other over the first three domains. It is in the 
peripheral portions of their Fc segments that they differ. 

Immunoglobulin M has a longer heavy o polypep- 
tide (naa = 570-580) than the one in immunoglobulin G 
by one extra domain, C,4, which would be the analogue 
of Cy4, if Cy4 existed. Immunoglobulin M is a pen- 
tameric complex of five (aß),heterotetramers held 
together by cystines among themselves. The cystines 
cross-link pairs of C,3 domains to form a pentameric ring 
of the heterotetramers. 

Immunoglobulin A has a heavy o polypeptide only 
about 30 amino acids (n,.= 470-480)" longer than that of 
immunoglobulin G. Immunoglobulin A is a mixture of 
monomeric (aß),heterotetramers similar to those of 
immunoglobulin G and higher oligomers of (a), het- 
erotetramers held together by cystines between their 
C,3 domains.'® 

Both immunoglobulins M and A have a short 
polypeptide J associated with them that may promote 
their initial oligomerization even before the interte- 
trameric cystines are formed.” Immunoglobulins M and 
A have their binding sites for antigens in a similar loca- 
tion to those on immunoglobulins G and distant from the 
regions (C,3, C,4, and C,3) that account for their distinct 
oligomeric structures. The only significant differences 
between these types of immunoglobulins and 
immunoglobulinsG is their size and, hence, their 
valence. Monovalent Fab fragments can be produced 
from each. DI" Although the injection of an antigen 
usually stimulates the production of immunoglobu- 
lins G, it is not unusual for it to stimulate production of 
the other types, either instead of or along with immuno- 
globulins G. 

An immunoglobulin ofa particular sequence is pro- 
duced by a colony of lymphocytes, all derived from one 
single cell that was initially stimulated to divide and 
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manufacture. All members of the colony secrete 
immunoglobulins with identical œ polypeptides and 
identical D polypeptides. The colony assumes its identity 
from its pedigree and not from its situation. All of the 
lymphocytes in the colony are descendants of the same 
cell, but each member of the colony, like all other lym- 
phocytes, is dispersed by the bloodstream and lymphatic 
system and wanders independently and at random 
through the animal as it continuously manufactures its 
particular immunoglobulin. The sole product of the 
members of a particular colony is this one immunoglob- 
ulin continuously released into the serum and extracel- 
lular fluid. Each time a lymphocyte is stimulated to 
divide and manufacture its particular immunoglobulin 
against a particular antigen, a new colony is established. 
The particular amino acid sequences of the two subunits 
of the immunoglobulin produced by a particular colony 
confer the ability to bind a particular antigen. 

Because many (10-100) different lymphocytes are 
stimulated to divide and manufacture by molecules of 
the same antigen, many different colonies continuously 
produce immunoglobulins after exposure of an animal 
to a particular antigen. Each of these immunoglobulins 
has a different amino acid sequence, but all are specific 
for that one antigen. They differ, however, in the location 
on the surface of the antigen that they recognize and to 
which they bind and in the strength with which they bind 
to that location, as reflected in their individual dissocia- 
tion constants for the binding of antigen. Such a set of 
immunoglobulins, each capable of recognizing the same 
antigen but each different from the others, is referred to 
as a polyclonal set. The product of the reaction of an 
intact animal to an antigen is always a polyclonal set of 
immunoglobulins, which are present as a complex mix- 
ture in the serum of the animal. 

In a normal animal, the various colonies are stable 
contributors to the mixture of immunoglobulins in the 
serum necessary to deal with antigens in the environ- 
ment. Occasionally, the controls maintaining the stable 
population of the colony fail, and one lymphocyte begins 
to multiply malignantly. This uncontrolled cancerous 
growth causes an enormous increase in the number of 
lymphocytes producing an immunoglobulin of just one 
unique sequence and structure. Such a cancer is referred 
to as amyeloma. The serum of such individuals contains 
high concentrations of only one type of immunoglobu- 
lin. Such myeloma proteins are present in sufficient 
quantities to be purified, sequenced,’ and crystallized.‘ 
Myeloma proteins appear by chance as the products of 
the random malignant transformation of normal lym- 
phocytes, and the antigens to which most myelomas are 
directed are unknown. 

The disadvantage of a myeloma protein is that the 
investigator cannot choose the antigen against which it is 
directed. Its advantage is that it can be purified to homo- 
geneity, and the purified protein is necessarily composed 
of identical copies of the same molecule, each with an 
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identical ability to recognize the antigen. The advantage 
of the ability to select the antigen and the advantage of 
the homogeneity of the product have been combined in 
the production of monoclonal immunoglobulins.’® In 
this procedure, lymphocytes from the spleen of a mouse 
that has been immunized with the antigen of interest are 
fused with cultured murine myeloma cells that normally 
secrete a particular myeloma protein or, even better, 
myeloma cells that have lost their incitement to secrete 
an immunoglobulin. These cultured myeloma cells are 
immortal cell lines that were originally derived from a 
myeloma in a mouse and that continuously grow and 
divide either in flasks in an incubator or as solid tumors 
in mice. Hybrids, each produced by the fusion of one 
lymphocyte and one myeloma cell, are selected on the 
basis of their ability to grow on a particular medium. The 
hybrids are then reproduced in dishes as single colonies 
of cells. 

Because each colony in the dish arose from one 
single cell, the cells in a particular colony produce only 
the immunoglobulin originally secreted by the parental 
lymphocyte that fused to the myeloma cell and the 
myeloma protein originally secreted by that myeloma 
cell. Therefore, each colony is the offspring of only one 
lymphocyte in the mouse from which the spleen was 
taken. If that lymphocyte happened to be producing one 
of the immunoglobulins directed against the antigen orig- 
inally injected into the mouse, its offspring can be culti- 
vated for the production of a homogeneous monoclonal 
immunoglobulin recognizing that antigen. To identify the 
colonies secreting monoclonal immunoglobulins against 
the antigen of interest, each colony is individually 
screened. When a colony producing a monoclonal 
immunoglobulin that has the desired specificity has been 
identified by the screen, it is expanded either in culture 
or in an animal so that significant amounts of that mon- 
oclonal immunoglobulin can be produced and purified. 

Although a few intact myeloma proteins‘ and intact 
monoclonal immunoglobulins (Figure 11-1)” have been 
crystallized and submitted to crystallographic analysis, 
most of the crystallographic molecular models contain- 
ing complexes with antigens are those of Fab frag- 
ments.’””! The sites on the surface of an intact 
immunoglobulin that bind tightly to the two respective 
copies of the antigen are at the far ends of the Fab arms 
(Figure 11-1), at the tip of the portion of the intact mole- 
cule formed from the association of a Vy domain and a 
Vu domain. The Fab fragment retains this site in its 
entirety, and it is on the opposite end of the fragment 
from the carboxy-terminal point of cleavage that pro- 
duced the Fabfragment. An example of a complex 
between an Fab fragment and its antigen is that between 
lysozyme from Gallus gallus and the Fab fragment of a 
murine monoclonal immunoglobulin (Figure 1.977 

In the complex between an immunoglobulin and 
its antigen, the single site on the Fab fragment for bind- 
ing the antigen is formed from six loops of random 


meander, three from each polypeptide, heavy and light. 
These loops are the complementarity-determining 
regions of the structure. Each of these six loops is one of 
the connections between two of the strands of the 
antiparallel 6 structure that form the superstructure of 
the core of the respective domains (Figure 11-2). 

The amino acid sequences in the loops of these six 
complementarity-determining regions show remarkable 
variation among the different immunoglobulins, and 
they are referred to as the hypervariable regions of the 
sequences.” It is this variety of amino acid sequence that 
gives the immunoglobulins as a class their ability to pro- 
vide individual proteins each tailored to bind a particular 
antigen. The specific sequences in these loops define the 
specificity of the particular immunoglobulin. Both the 
specific amino acid sequence and the lengths of these 
loops differ among the various immunoglobulins. The 
variations in sequence, as well as the variations in length, 
cause both the structure of the polypeptide forming 
these loops and the distribution of functional groups 
over the surface formed by these loops to differ dramati- 
cally from one immunoglobulin to the next.” As an illus- 
tration of the opportunism of biological processes, it is 
also possible for a sequence that dictates the glycosyla- 
tion of an asparagine to be found in a complementarity- 
determining region and for the oligosaccharide attached 
to that asparagine to contribute favorably to the binding 
of the antigen.” It is these variations in structure and 
chemical character that produce the array of potential 
specificities at the site formed by the six loops. In con- 
trast to this wild variation, the structures of the cores of 
the Vu domain and the V, domain remain constant 
because the amino acid sequences forming these cores 
are well conserved.”° 

Each immunoglobulin that appears in response to 
an antigen possesses a binding site that is formed by the 
complementarity-determining regions and that binds an 
epitope on the antigen. An epitope is the region on the 
antigenic protein that interacts directly with the binding 
site on the immunoglobulin. An antigen can have one or 
many epitopes, and each epitope elicits many different 
colonies each producing a different immunoglobulin 
recognizing that epitope and binding to it with its own 
particular dissociation constant. Each epitope is one of 
the regions on the antigen that induced the reproduction 
of the members of the colony of lymphocytes and their 
production of that particular immunoglobulin. Usually, 
an epitope is one or two short sequences of amino acids 
plus two or three other side chains in the antigen that are 
all adjacent to each other on its surface and that associ- 
ate specifically with the surface formed by the six com- 
plementarity-determining regions. The epitope and the 
binding site on the immunoglobulin combine noncova- 
lently as if they were two faces forming a heterologous 
interface in a heterooligomeric protein. 

In the crystallographic molecular model of the 
complex between lysozyme and the Fab fragment 
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micrococcal 


nuclease from Staphylococcus aureus (Table 11-1), there 
, that all six complemen- 


tarity-determining regions be involved in the binding 
determining regions of the 


associate with two segments of 


polypeptide from lysozyme, that between Aspartate 18 
bind antigen effectively with only the 


but there are also contacts involving 


single amino acids from two other segments, most 


all six of the loops of the complementar- 
notably Histidine 124. All six of the complementarity- 


determining regions of the immunoglobulin are 
and Asparagine 27 and that between Lysine 116 and 
Leucine 129. These two strands forming the epitope are 
immediately adjacent to each other on the surface of the 
protein. In the interface between murine monoclonal 
are also two segments of polypeptide that constitute the 
majority of the epitope, the o helix from Glutamate 57 to 
Lysine 70 and the f turn and co. helix from Aspartate 95 to 
determining regions also participate in this complex. 
site. Immunoglobulins G from camels, which have no 
Because of their almost limitless variety and 
because they are adventitious, interfaces between 
immunoglobulins and their antigens are paradigms for 
all heterologous interfaces, and the details of the interac- 
tions within these interfaces” are typical of those found 
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Table 11-1 Amino Acids within the Interface between a Murine Monoclonal Immunoglobulin G and Its 


Antigen, Micrococcal Nuclease“ 


micrococcal nuclease immunoglobulin contacts 

secondary amino acid CDR? amino acids“ VDW’ HB’ IHB’ 
structure 
amino terminus Lys 9 H2 Ser 54, Thr 56 3 
o helix Glu 57 Ll Ser 28 3 

Ala 60 Ll Ser 28, Thr 27 7 

Phe 61 Ll Phe 30, Ser 29 5 

Lys 64 L1, L3 Thr 27, Trp 92 17 

Asn 68 L3 Glu 93,8 Trp 92 8 1 
bend Lys 70 L3 Glu 93,8 Ile 94 4 1 
B strand Asp 95 H2, L3 Tyr H50, Ile L94 5 1 
Bturn Gly 96 H2 Thr 52, Ser 54 3 

Lys 97 H2, L3 Tyr H50,8 Tyr L96 11 1 
æ helix Met 98 H2 Tyr 53 2 

Arg 105 H3 Asn 96 7 2 

Gln 106 L2 Tyr 50 1 
æ helix His 121 H2 Tyr 53 1 

His 124 H1, H2 Ser 31, Tyr 53 20 

Lys 127 H1 Tyr 27,8 Ser 318 5 2 


“The crystallographic molecular model was that for the complex between the Fab fragment of murine monoclonal immunoglobulin N10 and 
micrococcal nuclease from S. aureus.?” "Complementarity-determining region. Those from the heavy o polypeptide of the immunoglobulin are 
designated H and those from the light 8 polypeptide are designated L. Numbering is from the amino terminus. ‘Amino acids from the indicated 
loops of the complementarity-determining regions that contact the particular amino acid on the surface of micrococcal nuclease. “van der Waals 
contacts. “Hydrogen bond between two neutral atoms or between one charged atom and one neutral atom. ‘Hydrogen bond between oppositely 


charged atoms. Donor or acceptor where charge is ambiguous. 


in other examples of heterologous associations "7 Usually 
5-10 nm? of accessible surface area from both antigen 
and immunoglobulin is buried and 5-20 hydrogen bonds 
form across the interface (Table 11-1), with donors and 
acceptors from both side chains and backbone. 
Normally, there are few ionized hydrogen bonds (Table 
11-1),” but in the interface between cytochrome c and 
murine monoclonal immunoglobulin E8 there are Duve 7 

Waters are also incorporated as structural elements 
into these interfaces. For example, in the interface within 
the crystallographic molecular model of equine 
cytochrome c and the Fab fragment of murine mono- 
clonal immunoglobulin E8,”! there are 38 positions occu- 
pied by molecules of water,” 16 of which are present at 
the same locations in crystallographic molecular models 
of either the uncomplexed antigen or uncomplexed 
Fab fragment or in both, and these locations are simply 
incorporated into the interface within their respective 
faces. Eight of these waters bridge the two proteins. In 
the interface between lysozyme from G. gallus and the 
murine Fab fragment HyHEL-63,™ there are also 38 posi- 
tions occupied by molecules of water,” 14 of which are 
present at the same locations in uncomplexed antigen 
and uncomplexed Fab fragment, and eight of these 


bridge the two proteins; and in the interface between 
human tissue factor and the murine Fab fragment D3h44 
there are 46 positions occupied by molecules of water,” 
23 of which are incorporated as structural elements of 
the uncomplexed antigen and uncomplexed Fab frag- 
ment, and 19 of these bridge the two structures. 

The central and most critical amino acid in the epi- 
tope on lysozyme recognized by murine monoclonal 
immunoglobulin D1.3 (Figure 11-2B) is Glutamine 121, 
the side chain of which occupies a distinct hole among 
the six loops of the complementarity-determining 
regions on the surface of the Fabfragment (Figure 
11-2B). Each of the three amino acids lining the hole for 
Glutamine 121 is from a different complementarity- 
determining loop, one from the heavy o subunit and two 
from the light 6 subunit, and this places the hole in the 
very center of the binding site on the Fab fragment. If 
Glutamine 121 is replaced by either a histidine or an 
asparagine by site-directed mutation, the antigen is no 
longer bound by the immunoglobulin. In the normal 
structure of lysozyme, Glutamine 121 is fully exposed to 
the solvent. 

It is often the case that an epitope seems to be 
focused on a particular amino acid on the surface of a 


protein. For example, about 30-40% of the polyclonal 
immunoglobulins raised to human cytochrome c fail to 
bind to the cytochrome c from Macaca mulatta, which 
differs from the human protein only by the replacement 
of Isoleucine 58 by a threonine.** These immunoglobu- 
lins that fail to recognize cytochrome c from M. mulatta 
do, however, recognize cytochromec from Macropus 
canguru that differs from the human at several other 
locations but does contain Isoleucine 58. No 
immunoglobulins raised to the cytochrome c from M. mu- 
latta failed to recognize the cytochromec from the 
human, and this result suggests that when a cytochrome c 
contains a threonine at position 58, as does the protein 
from M. mulatta, this region on the external surface” is 
not antigenic.” The impression left by these observa- 
tions is that Isoleucine 58 is the key amino acid in this 
epitope, as is Glutamine 121 in the epitope of lysozyme. 

In the crystallographic molecular model of the 
complex between lysozyme and murine monoclonal 
immunoglobulin D1.3, the structure of the lysozyme is 
identical, within the error of the models, with its struc- 
ture in the absence of the immunoglobulin,” and the 
structures of both the V; and the Vy domains of the 
Fab fragment are also identical, within the errors of the 
models with their structures in the uncomplexed 
Fab fragment, even though the two domains have 
shifted slightly relative to each other by 0.1 nm. In this 
instance, formation of the complex between antigen and 
immunoglobulin is simply the docking of two comple- 
mentary faces. In the crystallographic molecular model 
of the complex between human tissue factor and the 
Fab fragment of murine monoclonal immuno- 
globulin D3h44, “conformational changes upon forma- 
tion of the complex are very small and almost exclusively 
limited to the reorientation of side-chains”. 

Usually, however, the conformations of both anti- 
gen and immunoglobulin change noticeably as the 
complex is formed.*! The relative orientations of the 
Vu domain and V; domain can shift by 5-10°.*! One or 
more of the loops of the complementarity-determining 
regions can be reconfigured? or can pivot so that their 
tips move as much as 1.0 nm.” Side chains on both anti- 
gen and immunoglobulin often reorient by rotating 
around their carbon-carbon bonds.?”*! Flexible strands 
of polypeptide on the surface of the antigen readily 
rearrange upon formation of the complex.’ Most of 
these changes, however, are small ones, and the surface 
formed by the six complementarity-determining regions 
in the uncomplexed immunoglobulin has a shape that is 
already roughly complementary to the surface of the 
uncomplexed epitope,” so that only a few readjustments 
are required upon formation of the complex. 

The binding site on an immunoglobulin is usually a 
flat surface or a depression on the surface of a globular 
protein formed from the Vy and V, domains (Figure 11-2). 
Viruses appear to take advantage of this feature of the site. 
The surface of picornaviruses such as rhinoviruses and 
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polioviruses are highly irregular. They are furnished with 
bosses at the 5-fold rotational axes of symmetry of the 
icosahedral shell. These bosses are separated from each 
other by deep depressions on the surface of the virus. The 
epitopes on polioviruses are located mainly on the bosses 
themselves and a few small protruding segments of 
polypeptide.” It is believed that the crucial regions of the 
surface of rhinoviruses that allow them to produce an 
upper respiratory infection are located in the depressions 
between the bosses.“ These locations would be inacces- 
sible to immunoglobulins. Each time the epitopes on the 
bosses or the smaller protrusions of rhinoviruses sustain 
a sufficient number of mutations to escape recognition, 
an antigenically novel but still infectious rhinovirus 
arises. The actual machinery of infection, lying as it does 
within the depressions, would be protected from being 
recognized by any immunoglobulin. 

This strategy depends on the fact that most antigen 
binding sites are themselves flat or concave. On the 
murine monoclonal immunoglobulin HyHEL-5, how- 
ever, Tyrosine 33 and Tyrosine 53 from complementar- 
ity-determining regions 1 and 2, respectively, of the 
Vy domain form a protrusion that juts out of the almost 
flat surface that constitutes the binding site for lysozyme, 
its antigen. In the complex between this immunoglobu- 
lin and its antigen, this protrusion fits into a deep groove 
on the surface of the protein that forms the active site of 
the enzyme." Consequently, immunoglobulins do not 
do so often but they can recognize their antigens by pro- 
truding into their structure instead of surrounding a pro- 
trusion on their surface. 

The interface between antigen and immunoglobu- 
lin seen in the crystallographic molecular model’! of the 
neuraminidase from influenza virus and the Fab frag- 
ment from murine monoclonal immunoglobulin NC41 is 
formed from at least four juxtaposed strands of polypep- 
tide from distant segments of the amino acid sequence of 
the neuraminidase. When amino acids in any one of 
three of these four strands are mutated, the neu- 
raminidase can no longer bind to the immunoglobulin. 
When amino acids in the fourth strand are mutated, the 
affinity of the binding is noticeably diminished. All of 
these results indicate that the epitope on this protein for 
this immunoglobulin is a large region on the surface 
comprising all four of these strands. It seems an 
inescapable conclusion that this epitope would cease to 
exist if the protein were to unfold in this region. Such an 
epitope is a conformationally specific epitope. 

Many immunoglobulins, however, will recognize 
their antigens even when the antigenic protein is no 
longer in its native structure. These are sequence-spe- 
cific immunoglobulins. The paradigm of this class 
would be an immunoglobulin that, when covalently cou- 
pled to a stationary phase, can be used to isolate by affin- 
ity adsorption a peptide comprising its epitope from a 
digest of its antigen;”” such an immunoglobulin can rec- 
ognize its epitope even when it is a formless peptide. 
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The ability of an immunoglobulin to recognize 
short peptides from an endopeptolytic digest of its anti- 
gen! or short synthetic peptides with sequences of 
amino acids from its antigen” is used routinely, in the 
absence of a crystallographic molecular model of the 
complex between intact antigen and immunoglobulin, to 
identify the epitope on the antigen recognized by the 
immunoglobulin. In fact, when crystals of a complex 
between Fab fragment and intact antigen cannot be 
made, a crystallographic molecular model of a peptide 
from the antigen bound within the binding site on the 
immunoglobulin is assumed to depict accurately at least 
a portion if not all of the structure of the interface in the 
complex with the intact antigen.”® 

Usually the class of sequence-specific immuno- 
globulins is distinguished from the class of conforma- 
tionally specific immunoglobulins. Although the 
monoclonal immunoglobulin NC41 specific for viral 
neuraminidase that was used in the crystallographic 
studies seems to be an obvious member of the latter 
class, the evidence for such conformationally specific 
immunoglobulins is often anecdotal. A protein is irre- 
versibly unfolded and loses its antigenic properties. To 
unfold a polypeptide irreversibly, however, it is usually 
either covalently modified?” or noncovalently and inter- 
molecularly polymerized by aggregation. Either the epi- 
tope could be covalently modified or it could be sterically 
sequestered within an aggregate of the protein during 
such uncontrolled reactions. If either of these events 
occurs, it appears as if the immunoglobulin were con- 
formationally specific and recognized only the native 
structure of the protein when actually it was sequence- 
specific and recognized a single linear sequence of 
amino acids that was simply covalently modified or inac- 
cessible. The technical difficulty is to unfold the polypep- 
tide of the antigen to a monodisperse random polymer 
without doing the same thing to the immunoglobulin, 
which is also a folded polypeptide. Digestion of the anti- 
gen accomplishes this goal by producing structureless 
peptides, but if the immunoglobulin fails to recognize 
any of these peptides it could simply be the case that 
cleavage has occurred within the epitope. 

It is probably the case that the distinction between 
sequence-specific and conformationally specific 
immunoglobulins is one of degree. For example, a poly- 
clonal set of immunoglobulins was raised against native 
micrococcal nuclease from S. aureus (Naa = 149) and puri- 
fied on the basis of its ability to recognize a fragment of 
the intact polypeptide comprising amino acids 99-149. 
These immunoglobulins could bind the intact, folded 
polypeptide of micrococcal nuclease 2 x 10* times more 
tightly, as judged from the dissociation constants, than 
they could the fragment.” The fragment is a monodis- 
perse random polymer, and it may be that the binding 
site on the immunoglobulin recognizes only the small 
portion of the random polymer that by chance has 
assumed the proper conformation. This would explain 


the much greater dissociation constant of the fragment 
relative to the native protein. It is also possible, however, 
that the immunoglobulins still recognize the epitope or 
portions of the epitope after it is unfolded, but with much 
smaller free energy of dissociation. A difference in disso- 
ciation constant of even the magnitude observed for the 
complexes between the immunoglobulins and the nucle- 
ase, if the concentrations of immunoglobulin and anti- 
gen and the individual dissociation constants are in the 
appropriate ranges, would be observed as a complete 
elimination of the ability of antigen to bind to 
immunoglobulin when it is in fact only a finite attenua- 
tion of the ability of antigen to bind to immunoglobulin. 

Because the immune system was developed to rec- 
ognize and destroy foreign organisms such as viruses and 
bacteria and because other systems are used by animals 
to eliminate small toxic molecules, antigens are always 
large macromolecules, usually proteins, and never small 
molecules. This fastidiousness of the immune system 
can be circumvented by covalently attaching a small 
molecule to a carrier protein as a hapten. The attach- 
ment is accomplished with chemical couplings analo- 
gous to those used for covalent modification or 
cross-linking of proteins. In fact, many substances that 
are able to covalently modify proteins produce undesir- 
able immune reactions by attaching to proteins in an 
animal or the incautious investigator and turning those 
immunologically benign proteins into malignant anti- 
gens. 

A hapten, when covalently attached to the carrier 
protein, protrudes from its surface even more dramati- 
cally than Glutamine 121 does from the surface of 
lysozyme. Because of its peculiar chemical structure, the 
hapten is usually the focus of the immune system; and 
because it protrudes from the surface, the entire hapten 
usually ends up occupying a deep hole or deep crevice 
among the loops of two or three complementarity-deter- 
mining regions.” These facts explain why the unat- 
tached hapten can usually be bound efficiently by the 
immunoglobulin. For example, polyclonal immunoglob- 
ulins raised against a protein the lysines of which 
had been modified by 2,4,6-trinitrobenzenesulfonate 
(Reaction 10-28) bind N*-(2,4,6-trinitrophenyl)lysine 
with dissociation constants of 10” to 10° MI It is this 
ability of an immunoglobulin raised against a hapten to 
bind tightly the small molecule from which the hapten 
was derived that permits immunoglobulins to be used in 
highly specific assays for small molecules.” 

Although immunoglobulins for use in protein 
chemistry are usually raised by injecting an intact pro- 
tein into an animal, it is also possible to raise 
immunoglobulins directed against synthetic peptides 
with the same amino acid sequence as a segment from a 
particular protein.” The synthetic peptides are attached 
as haptens to another protein. For example, the amino- 
terminal amino acid sequence of the large tumor antigen 
from simian virus is ACMDKVLNR-, where Ac is a post- 


translationally added acetyl group. Immunoglobulins 
raised against a synthetic peptide with this sequence that 
had been covalently attached to bovine serum albumin 
as a hapten were able to recognize and bind exclusively 
to the large tumor antigen protein in crude homogenates 
from animal cells infected with the simian virus.” 
Immunoglobulins directed against a particular peptide 
can be purified by using a stationary phase to which the 
peptide is attached as an affinity adsorbent.” 

The difficulty inherent in the use of immunoglobu- 
lins raised against a particular amino acid sequence in a 
protein to study that protein in its native conformation is 
that the investigator, a fallible judge, rather than the 
immune system, which is less fallible, has chosen the epi- 
tope. A native protein does not expose many sequences 
sufficiently for immunoglobulins to recognize them. If 
the investigator has chosen the epitope, there is only a 
small chance that it will be accessible on the surface of 
the antigen and recognized by an immunoglobulin in the 
native protein, unless the choice is the safe one of the 
amino terminus or the carboxy terminus, which are usu- 
ally well exposed in a native protein. The segment of 
sequence against which the immunoglobulin was raised, 
however, will usually be accessible in the unfolded 
polypeptide. 

Any solution of a protein, even pristine cytoplasm, 
freshly drawn plasma, or a solution of redissolved crys- 
tals, contains some of the irreversibly unfolded polypep- 
tide of any particular protein. Immunoglobulins raised 
against synthetic peptides necessarily bind preferen- 
tially, and often exclusively, to any unfolded protein 
exposing the sequence of amino acids against which they 
were raised because the original antigen itself was a 
structureless peptide of that sequence. Yet most experi- 
ments are designed with the requirement that the 
immunoglobulins recognize and bind to the native pro- 
tein for the conclusion to be valid. 

Crystallographic molecular models of complexes 
between immunoglobulins raised against synthetic pep- 
tides and the peptides themselves heighten these con- 
cerns.” In the binding site on the immunoglobulin, the 
peptide usually binds rigidly in a conformation that dif- 
fers dramatically from the conformation that the same 
sequence of amino acids assumes in the native protein. 
Furthermore, it is, as might be expected, usually buried 
in a crevice among the loops of the complementarity- 
determining regions. Consequently, it is difficult to 
understand how such an immunoglobulin can ever rec- 
ognize that sequence in the native protein. 

The solution to this problem is analytical. When 
either monoclonal immunoglobulins or polyclonal 
immunoglobulins raised against synthetic peptides are 
used, one can assume that each folded polypeptide pos- 
sesses only one epitope, either exposed or buried. It is 
also possible to purify by affinity adsorption a subset of 
polyclonal immunoglobulins recognizing only one epi- 
tope from a set of polyclonal immunoglobulins raised 
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against an intact protein.”"” If every molecule of protein 
in a solution can bind one immunoglobulin at a particu- 
lar epitope, then the immunoglobulin is recognizing the 
native protein. If only a small percentage of the mole- 
cules of protein in a solution can bind an immunoglobu- 
lin at that epitope, it is only the unfolded polypeptides 
that are presenting that epitope. Consequently, if every 
protomer of an antigen binds one molecule of an 
immunoglobulin, when only immunoglobulins directed 
against one epitope are present, then the immunoglobu- 
lins must be recognizing the epitope when it is in the 
native protein. In any experiment relying on the assump- 
tion that the immunoglobulins recognize the native pro- 
tein, it must be demonstrated both that the 
immunoglobulins bind to only one unique epitope and 
that every protomer of the antigen is capable of binding 
one molecule of immunoglobulin. In the absence of such 
a demonstration, the conclusions reached can be disre- 
garded. 

In contrast to these difficulties encountered when 
they are used to examine a native protein, immunoglob- 
ulins raised against a synthetic peptide can be used to 
purify a peptide containing a particular amino acid in 
the sequence of a protein from an endopeptolytic digest 
of that protein.” The amino acid sequence surrounding 
Lysine 380 of the a subunit of acetylcholine receptor is 
—SAIEGVKYIAEHM-. The synthetic peptide KYIAE was 
coupled covalently as a hapten to bovine serum albumin, 
and polyclonal immunoglobulins were raised against 
this antigen in rabbits. Because the peptide had been 
coupled to the serum albumin through the amino groups 
of its lysine and of its amino terminus, the carboxy- 
terminal sequences -YIAE protruded as haptens from the 
surface of the serum albumin. The antiserum was passed 
over a stationary phase to which the peptide KYIAE had 
been covalently attached, and immunoglobulins specific 
for the carboxy-terminal sequence -YIAE were adsorbed 
by the affinity adsorbent. After all of the other proteins in 
the serum had been washed away, the adsorbed 
immunoglobulins were eluted. These purified 
immunoglobulins in turn were covalently attached to a 
stationary phase to produce an immunoadsorbent spe- 
cific for the carboxy-terminal sequence -YIAE. When 
acetylcholine receptor was digested with glutamyl 
endopeptidase (Figure 3-2) and the digest was passed 
over the immunoadsorbent, the peptide GVKYIAE was 
adsorbed and eluted in high yield and high purity.” The 
covalent modification of Lysine 380 in intact, native 
acetylcholine receptor could be readily monitored by 
using this immunoadsorbent to purify rapidly the pep- 
tide containing it. 

Such an immunoadsorbent can purify a peptide 
from the digest of a large protein, which contains so 
many peptides that a direct purification by chromatogra- 
phy would be difficult if not impossible. If a chemical 
modification of an amino acid in the protein within the 
sequence included in the peptide occurs in low yield so 
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that the modified peptide is only a minor component of 
the digest, the immunoadsorbent will still purify it.” In 
fact, the targeted amino acid can be destroyed by the 
modification and the protein cleaved at the point of 
destruction, and the immunoadsorbent will still purify 
the peptide containing the remaining fragment of that 
amino acid.” By use of such an immunoadsorbent, for 
example, the modification of a particular amino acid ina 
protein can be followed kinetically” or its accessibility to 
a particular electrophile under different circumstances 
can be monitored.” 

Immunoglobulins directed against an antigenic 
protein can effect immunoprecipitation. An immuno- 
precipitate is a visible, white precipitate that forms when 
an antigen and its immunoglobulin are present in solu- 
tion at the proper concentrations. Each immunoglobulin 
has at least two binding sites for antigen (Figure 11-1). 
Each antigen, if it is a protein, is usually polyvalent. A 
polyvalent antigen is one that has more than one epi- 
tope. If the polyclonal mixture of immunoglobulins in 
the serum contains several monoclonal immunoglobu- 
lins directed against different epitopes on the antigen 
and if this mixture of immunoglobulins is mixed in the 
proper ratio with that antigen, an immunoprecipitate 
forms containing antigen and immunoglobulin cross- 
linked among themselves. The ratio of concentrations at 
which maximum precipitation occurs is known as the 
equivalence point. If there is only one epitope on the 
antigen that has elicited the immune response, no pre- 
cipitate will form. An Fab fragment, because it is univa- 
lent, cannot produce a precipitate. If a large excess of 
antigen is present, each immunoglobulin has its binding 
sites filled with antigens that are not bound to other 
immunoglobulins and no precipitate forms. If a large 
excess of immunoglobulin is present, each antigen is sur- 
rounded by immunoglobulins, each with its other bind- 
ing site vacant, and no precipitate forms. Such soluble 
complexes of excess antigen and excess immunoglobulin 
are present under all circumstances, and it is never pos- 
sible to precipitate directly all of the antigen or all of the 
immunoglobulins even at equivalence. Finally, the molar 
ratio between antigenic protein and immunoglobulin in 
a precipitate gathered at equivalence is a complicated 
function of the number of epitopes, their relative affini- 
ties, and the distribution of the different monoclonal 
immunoglobulins in the polyclonal mixture. 

After a protein of interest has been injected as a 
potential antigen into an animal, the appearance of the 
desired immunoglobulins in the serum is often detected 
by the ability of those immunoglobulins to form an 
immunoprecipitate. The simplest way to produce the 
proper ratio of concentrations in order to observe a pre- 
cipitate is to layer a solution of the antigen onto a sample 
of the serum in a narrow tube. As antigen diffuses into 
the serum and immunoglobulin diffuses into the solu- 
tion of antigen, there will be a point along the two gradi- 
ents at which the concentrations are those necessary for 


precipitation to occur and a visible disc of precipitate 
will form at this location. 

Immunodiffusion is a more sophisticated proce- 
dure for permitting antigen and immunoglobulin to dif- 
fuse into each other.®' Antigen and serum are placed in 
separate wells cut in a block of agar, and as they diffuse 
outward into the agar and towards each other, there will 
be a line between the two wells along which the ratio of 
concentrations is appropriate for precipitation. Along 
this line a white immunoprecipitate will form. When the 
original antigen and the same protein from a different 
species or a mutant variety of the antigen are placed in 
two adjacent wells both the same distance from the well 
containing immunoglobulin, the pattern of the lines of 
immunoprecipitate demonstrates whether or not the 
related protein shares all of the epitopes present on the 
original antigen.” 

Complement fixation is a procedure for detecting 
the relative concentrations of immunoprecipitates in a 
series of samples.‘ It is sensitive to concentrations of 
immunoptecipitate much smaller than can be observed 
visually. The procedure can be performed on a series of 
mixtures of antigen and immunoglobulin at different 
ratios of concentration to obtain a direct measurement 
of the equivalence point, which is the ratio at which com- 
plement fixation reaches its maximum level. 

An immunoprecipitate is held together by a com- 
plex collection of interfaces formed between the binding 
sites on the tips of the Fab arms of the various 
immunoglobulins present in antiserum and their respec- 
tive epitopes on the molecules of antigen. Each of the 
individual reactions between an epitope and the binding 
site on the Fab arm of an intact immunoglobulin is a 
simple dissociation: 


k 
Ep + Fab — Ep:Fab 
-1 


(11-1) 


where Fab is the binding site, Ep is the epitope, and 
Ep-Fab is the immune complex (Figure 11-2). The 
immune complexes between epitopes on antigens and 
immunoglobulins are quite strong, a fact that permits an 
immunoprecipitate to form even when antigen and 
immunoglobulins are present at the low concentrations 
used for procedures such as complement fixation. More 
useful than the strength of the association, however, is 
the fact that because k is almost always a small rate con- 
stant, dissociation of the complex is slow. Most of the 
procedures that use immunoglobulins depend on this 
slow dissociation of the complexes between them and 
their antigens. The slow dissociation permits the com- 
plex between antigen and its immunoglobulin to be sep- 
arated either from an excess of the specific 
immunoglobulin and other immunoglobulins and pro- 
teins in an antiserum or from all of the other proteins 
with which the antigen is mixed. For example, an 


immunoprecipitate can be extensively washed without 
dissociating. Immunoblotting, immunostaining, and 
immunoadsorption also rely on this advantage of the 
slow dissociation. 

One of the most important uses of immuno- 
chemistry is to identify the particular protein to which 
the antibody is directed. For example, a specific 
immunoglobulin can be used to stain only its antigen 
among all of the proteins in a heterogeneous mixture 
separated by electrophoresis.“ First the proteins that 
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acrylamide are transferred laterally by electrophoresis 
onto a membrane of nitrocellulose or poly(vinylidene 
difluoride) placed against the slab of polyacrylamide. 
This electrotransfer produces a blot on which the bands 
of protein in the polyacrylamide have become bands of 
protein arrayed in the same pattern but now plastered 
onto the membrane of polymer (Figure 11-3A).°® Then 
the blot is soaked in a solution of the specific 
immunoglobulin, and excess immunoglobulins are 
rinsed away. The immunoglobulins that were bound by 


have been separated by electrophoresis on a slab of poly- the antigen are immunostained with a second 
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Figure 11-3: Immunostaining of immunoblots of NADH dehydrogenase (ubiquinone) from bovine heart mitochondria. (A) NADH 
Dehydrogenase (ubiquinone) was dissolved in a solution of dodecyl sulfate and submitted to electrophoresis on a slab of polyacrylamide. 
Following the electrophoresis, the separated polypeptides on the gel were electrotransferred laterally onto a membrane of poly(vinylidene 
difluoride) placed against the slab by applying an electric field perpendicular to the plane of the slab. The protein in each band remained at 
the same position on the field and adhered tightly to the poly(vinylidene difluoride). The membrane was then stained with Coomassie bril- 
liant blue Di The 25 bands that appeared represent only a fraction of the more than 40 polypeptides in NADH dehydrogenase (ubiquinone). 
Reprinted with permission from ref 68. Copyright 1992 Elsevier B.V. (B-G) The complexes between dodecyl sulfate and the polypeptides of 
NADH dehydrogenase (ubiquinone) of 704, 444, 430, 228, 217, and 75 aa were each purified from subcomplexes of the enzvme "77 Following 
purification, each of them was injected into rabbits, and polyclonal immunoglobulins specific for each of the polypeptides were purified from 
the respective antiserum’!” by binding them to the respective polypeptide immobilized on a membrane of nitrocellulose, washing away the 
other proteins, and eluting the immunoglobulins.” Membranes of poly(vinylidene difluoride) to which electrophoretically separated 
polypeptides of intact NADH dehydrogenase (ubiquinone) had been electrotransferred as in part A were then immunostained” with puri- 
fied rabbit polyclonal immunoglobulins against the polypeptides of (B) 444 aa, (C) 217 aa, (D) 75 aa, (E) 704 aa, (F) 430 aa, and (G) 228 aa, 
respectively, followed by goat polyclonal immunoglobulins that were raised against rabbit immunoglobulin G and to which peroxidase had 
been covalently coupled. The electrophoretic separation in lane A and those in lanes B-G were performed in different laboratories on differ- 
ent polyacrylamide gels that produced different mobilities. (H-K) Intact NADH dehydrogenase (ubiquinone) with its more than 40 subunits 
was cross-linked with ethylene glycol bis(succinimidyl succinate). The product of the reaction was dissolved in a solution of dodecyl sulfate. 
Four identical samples of this solution were submitted to electrophoresis in separate lanes, the separated polypeptides electrotransferred to 
a sheet of poly(vinylidene difluoride), and the respective lanes were immunostained as in part B-H with immunoglobulins specific for the 
polypeptides of (H) 75 aa, (I) 217 aa, (J) 444 aa, and (K) 704 aa, respectively.” The respective un-cross-linked polypeptide is the lowest band 
in each lane, and covalent complexes in which the polypeptide participates are represented by the higher bands. Reprinted with permission 
from ref 76. Copyright 1993 American Chemical Society. 
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immunoglobulin directed against the first immunoglob- 
ulin, for example, ovine anti-murine immunoglobulin, to 
which either peroxidase™ or alkaline phosphatase has 
been attached. The peroxidase or alkaline phosphatase 
produces a colored precipitate at the location of the anti- 
gen (Figure 11-3, panels B-H). In this way, one protein 
can be picked out of a complex mixture because it is the 
only protein that is stained. For example, a murine mon- 
oclonal immunoglobulin that had been selected for its 
ability to inhibit the mitochondrial dicarboxylate trans- 
porter from Pisum sativum immunostained only one 
polypeptide on an immunoblot of a polyacrylamide gel 
on which the hundreds of polypeptides from intact mito- 
chondria dissolved in a solution of dodecyl sulfate had 
been separated electrophoretically.”” That polypeptide 
was assumed to be the transporter. 

Immunostaining of immunoblots is also used 
to identify which polypeptides are components of a 
particular covalent complex produced by cross-linking. 
NADH Dehydrogenase (ubiquinone) is a large protein 
composed of more than 40 different polypeptides (Figure 
11-3A).° Immunoglobulins that specifically recognize 
the polypeptides of 75, 217, 228, 430, 444, and 704 aa 
were produced in rabbits’ and purified by binding 
them to their antigens and then eluting them.” On 
immunoblots of polyacrylamide gels on which the com- 
plete protein with its more than 40 polypeptides had 
been separated (Figure 11-3A), each of the immunoglob- 
ulins directed immunostaining only to the polypeptide 
against which it was raised (Figure 11-3, lanes BC 

The complete protein was then cross-linked with 
ethylene glycol bis(succinimidyl succinate), and the 
polypeptides that were separated by electrophoresis 
were again immunoblotted and immunostained with the 
immunoglobulins specific for the polypeptides of 75, 
217, 444, and 704 aa (Figure 11-3, lanes H-K). Covalent 
complexes between the polypeptides of 217 and 75 aa; 
444 and 75 aa; 444 and 217 aa; 704 and 444 aa; 704, 444, 
and 75 aa; and 704, 444, and 217 aa could be positively 
identified on the basis of their mobility and, more impor- 
tantly, the fact that they were immunostained by the 
appropriate immunoglobulins. The most interesting 
result was the fact that none of these four polypeptides 
had been cross-linked to any of the other polypeptides in 
the complex. Because there are many polypeptides that 
have mobilities similar to others in the protein, only the 
immunostaining could sort out the products of the cross- 
linking reaction. 

One of the most effective ways to raise 
immunoglobulins that recognize a particular protein 
with high specificity is to use as a hapten a synthetic pep- 
tide with the sequence of the amino terminus or the 
carboxy terminus of that protein. Polyclonal immuno- 
globulins were raised against the synthetic peptide 
SEFIGA, the carboxy-terminal sequence of the human 
receptor for epidermal growth factor. The peptide had 
been attached as a hapten through its amino terminus to 


bovine serum albumin so the immunoglobulins recog- 
nized the carboxy-terminal sequence -EFIGA. These 
polyclonal immunoglobulins were purified by passing 
the antiserum over an affinity adsorbent composed of a 
solid phase to which the peptide had been attached cova- 
lently. When a homogenate of cultured human cells was 
dissolved in a solution of dodecyl sulfate and submitted 
to electrophoresis and the separated polypeptides were 
immunoblotted, the immunoglobulin raised against the 
peptide SEFIGA directed immunostaining only to the 
polypeptide of the receptor for epidermal growth factor.” 
The solution submitted to electrophoresis contained all 
of the hundreds of polypeptides in the cells, and yet only 
the receptor for epidermal growth factor was recognized 
by the polyclonal immunoglobulins. 

Immunoglobulins can also be used to isolate a par- 
ticular protein by immunoadsorption. An immuno- 
adsorbent is a stationary phase to which an 
immunoglobulin specific for a particular antigen has 
been covalently bound and with which that antigen can 
be purified by affinity adsorption.” For example, 
although the protein had not been purified, the amino 
acid sequence of Shaker S4 K* channel from Drosophila 
melanogaster had been determined genetically. The pro- 
tein was expressed in Sf9 insect cells following their 
infection with baculovirus into the DNA of which com- 
plementary DNA encoding the protein had been 
inserted. The expressed protein could then be purified® 
on an immunoadsorbent to which had been covalently 
attached immunoglobulins raised against the synthetic 
peptide EEEDTLNLPKAPVSPQDKS, an amino acid 
sequence from a region of the protein (amino acids 
333-351) thought to be an exposed loop connecting two 
æ helices. 

The gene encoding dystrophin, the protein that is 
missing in muscles of patients suffering from Duchenne/ 
Becker muscular dystrophy, had been identified before 
dystrophin itself was known to exist.°! An immunoadsor- 
bent, containing immunoglobulins raised against an 
expressed fusion protein containing a fragment of 556 aa 
from the amino acid sequence of dystrophin, was used to 
purify it.” 

The voltage-gated chloride channel from Torpedo 
californica had also been identified and sequenced 
genetically before it had been purified. It was then puri- 
fied in one step of affinity adsorption by use of a station- 
ary phase to which immunoglobulins recognizing the 
hydrophilic sequence EGQQREGLEAVKVQTEDP from 
the protein had been coupled covalently.™ 

Immunoadsorption can also be accomplished by 
adding an excess of specific immunoglobulins to a solu- 
tion to saturate all of the antigen, passing the solution 
over agarose to which protein A from S. aureus, a protein 
with a high affinity for immunoglobulin G, has been 
covalently attached,™ and eluting the adsorbed antigen 
from the agarose after the other proteins have been 
rinsed away. 


A protein can be identified by tagging it with an 
epitope.” A short segment of DNA encoding the amino 
acid sequence of an epitope to which an immunoglobu- 
lin has already been raised is inserted in phase at one end 
of the reading frame for the protein of interest. The pro- 
tein is then expressed from the complementary DNA into 
which the insertion has been made, and the expressed 
protein contains the amino acid sequence of the tag at 
one of its ends. The protein thus tagged can be identified 
by immunostaining and isolated by immunoadsorption 
with the immunoglobulins directed against the epitope. 
In this way, the protein encoded by an unidentified read- 
ing frame in a segment of genomic DNA or complemen- 
tary DNA can be identified and purified. If a protein has 
been tagged by an epitope at one of its ends, an 
immunoblot of a digest of that protein separated by elec- 
trophoresis in a solution of dodecyl sulfate will provide a 
map of the positions in the amino acid sequence at 
which cleavage occurred during the digestion. The 
lengths of the successive end-labeled fragments will cor- 
respond to the positions of cleavage in the sequence of 
the protein D 

Immunoglobulins are also used to screen libraries 
of cDNA.” If immunoglobulins specific for a protein of 
completely unknown sequence have been made, they 
can be used in such a screen to detect the complemen- 
tary DNA for that protein. The complementary DNAs to 
be screened are inserted into plasmids that cause the 
proteins they encode to be expressed when they are 
transfected into bacteria. The transfected bacteria are 
spread onto a field and grown into individual colonies, 
each of which consequently contains protein expressed 
from the cDNA on its respective plasmid. The bacteria 
are lysed and their proteins are immobilized on a surface 
in such a way that the immobilized proteins remain in 
the same location on the field and produce a replica of 
the original pattern of the colonies. The replica is soaked 
with the immunoglobulins that recognized the protein of 
interest and then washed, and those colonies that con- 
tained antigen are identified by immunostaining or by 
binding radioactive protein A from S. aureus to the 
bound immunoglobulins. The bacteria in a colony iden- 
tified in this way can then be replicated and the cDNA 
sequenced. 

Immunoelectron microscopy is used to identify 
the region on the surface of a protein containing the epi- 
tope against which particular immunoglobulins are 
directed. Most proteins, including immunoglobulins, are 
large enough to be observed in the electron microscope 
when embedded in a glass of negative stain. When the 
embedded complex between a protein and an 
immunoglobulin or the Fabfragment of an 
immunoglobulin is observed on an electron micrograph, 
the immunoglobulin or Fab fragment appears as a pro- 
trusion on the surface of the protein that it recognizes. 
Often the Fab arms and the Fc trunk of an immunoglob- 
ulin can be distinguished; usually, however, an 
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immunoglobulin appears only as a vague elongated 
structure. If the protein has a characteristic shape, the 
location of the epitope recognized by the immunoglobu- 
lin on the surface of that shape can be identified. 
&-Macroglobulin is a molecule that in an electron 
micrograph has the shape ofa letter H; amurine mono- 
clonal immunoglobulin specific for the domain of about 
200 aa responsible for the binding of the &-macroglobu- 
lin to its receptor binds to the ends of the arms on the H D 
Fibrinogen is a molecule that, in an electron micrograph, 
has three globular domains arranged in a row; a murine 
monoclonal immunoglobulin specific for the carboxy- 
terminal 150 amino acids of its o polypeptide binds near 
the central domain of the structure.” 

The multicatalytic endopeptidase complex is a 
cylinder composed of 14 different subunits, each present 
in two copies. A murine monoclonal immunoglobulin 
specific for one of these subunits binds at both ends of 
the cylinder, consistent with the existence of a 2-fold 
rotational axis of symmetry at the center of the cylinder 
and locating the positions in the cylinder of the two 
copies of that subunit.” Murine monoclonal immuno- 
globulins against several of the subunits in the multicat- 
alytic endopeptidase complex were always bound at two 
symmetrically displayed locations on the surface of the 
cylinder, and the respective angles between those posi- 
tions when the cylinder was viewed along its axis could 
be used to position those subunits relative to the 2-fold 
rotational axis of symmetry that is normal to the cylin- 
drical axis and at the middle of the cylinder.” 

The most extensive application of immunoelectron 
microscopy has been an examination of the distribution 
of the constituent polypeptides over the surface of the 
two subunits of the ribosome from Escherichia coli. The 
application of these procedures to the 30S subunit serves 
as an example. Although it was unknown at the time 
these experiments were performed, the core of the 30S 
ribosomal subunit is formed from ribosomal RNA, and 
luckily almost all of its constituent polypeptides are dis- 
tributed over its external surface and are accessible to 
immunoglobulins. 

The 21 unique polypeptides found in the 30S sub- 
unit of the ribosome can be separated and catalogued by 
two-dimensional gel electrophoresis (Figure 11-4).°” 
They have been separated and individually purified,” 
and their amino acid sequences have been determined.” 
Polyclonal sets of immunoglobulins have been raised 
against most of these polypeptides. When immunoglob- 
ulins specific for one of them were mixed with intact 30S 
ribosomal subunits and the immune complexes were 
then prepared for electron microscopy, individual 
immunoglobulins bound to individual 30S ribosomal 
subunits or cross-linking two 30S ribosomal subunits 
could be observed (Figure 11-5).°*** The 30S ribosomal 
subunit has a characteristic, asymmetric shape and the 
epitopes recognized by these immunoglobulins could be 
assigned to certain regions on the surface of that shape. 
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Figure 11-4: Separation of the polypeptides composing the 
30S subunit of ribosomes from E coli.” Intact ribosomes were iso- 
lated from a homogenate of bacteria by centrifugation. They were 
dissociated into subunits by treatment with MgCl, and the 
30S subunit was separated from the 50S subunit by centrifugation 
through a gradient of sucrose. The protein was extracted from the 
30S ribosomal subunit and dissolved in 6 M urea. The individual 
polypeptides were separated in the first dimension at pH 9.6 and in 
the second dimension at pH 4.6. Separation in each dimension was 
performed in 6M urea. The proteins were stained with amido 
black. Although only 17 components are observed, three polypep- 
tides coelectrophorese at one spot and two pairs of polypeptides 
coelectrophorese at two other spots. The total number of polypep- 
tides is 21. Reprinted with permission from ref 93. Copyright 1973 
Journal of Biological Chemistry. 


Two different laboratories have determined the dis- 
tributions of the various polypeptides over the surface of 
the 30S ribosomal subunit, based on the relative distri- 
butions of the antigenic sites” In each of these two 
sets of observations, the location at which each individ- 
ual immunoglobulin was bound on the 30S ribosomal 
subunit was ascertained by deciding visually which pro- 
jection and which orientation of a crude three-dimen- 
sional structure of the 30S ribosomal subunit each image 
in the micrograph represented. Both of these two crude 
structures, although significantly different from each 
other, incorporated a small globular domain, a large 
globular domain, and a significant protrusion on one 
side. These three features served to orient the particles, 
and the positions of the various polypeptides could be 
assigned relative to them. Luckily, these features did 
appear in the crystallographic molecular model of the 
30S subunit (Figure 11-5).°°10! 

Although there was initially significant disagree- 
ment about the distribution of these antigens over the 
surface of the 30S subunit,”°' the differences were 
resolved, and in the final maps from the two laborato- 
ries,” the agreement is quite close. Furthermore, 
these distributions determined by immunoelectron 
microscopy are in remarkable agreement with the distri- 
bution of the folded polypeptides over the crystallo- 
graphic molecular model that was obtained 10 years 
later.” 
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Problem 11-1: A polyclonal set of immunoglobulins was 
produced against the synthetic peptide -ETYY, the car- 
boxy-terminal sequence of Na IK eschangtng ATPase, a 
protein embedded in the membranes of animal cells. 
Immunoglobulins recognizing this peptide were purified 
by affinity adsorption on a solid phase to which the pep- 
tide had been attached. The immunoglobulins were 
made radioactive by reductive methylation (Figure 10-3) 
with formaldehyde and sodium [*H]borohydride to a 
final specific radioactivity of 10,760 cpm nmol”. Equal 
amounts of this immunoglobulin were mixed with 
increasing amounts of homogeneous Na‘/K*-exchang- 
ing ATPase in its membrane-bound form so that bound 
and unbound immunoglobulin could be separated by 
centrifugation after equilibrium had been reached. The 
amount of bound immunoglobulin increased linearly 
with the amount of membrane-bound protein added. It 
was found that each milligram of protein of purified 
Na*/K'-exchanging ATPase could bind 670cpm of 
radioactive immunoglobulin regardless of the final con- 
centration of immunoglobulin. The asymmetric unit of 
Na*/K'-exchanging ATPase is composed of one 
apolypeptide (Na = 1020) and one f polypeptide 
(Naa = 300). What fraction of the molecules of the ATPase 
displays epitopes? 

A monoclonal immunoglobulin was also produced 
against Na*/K'-exchanging ATPase. This monoclonal 
immunoglobulin could bind the synthetic peptide 
HLLVMKGAPER, which has a sequence identical to a seg- 
ment of the sequence from the a polypeptide of the 
enzyme. The relative concentration of binding sites of 
this immunoglobulin in a solution could be determined 
by an indirect immunoassay. When samples ofa solution 
of this monoclonal immunoglobulin, at a final concen- 
tration of 11 nM in binding sites for antigen, were mixed 
and brought to equilibrium with increasing concentra- 
tions of Na*/K*-exchanging ATPase, the concentration of 
unoccupied binding sites for antigen decreased. A final 
concentration of Na*/K*-exchanging ATPase of 300 ug 
mL" was required to decrease the concentration of 
active immunoglobulin by greater than 90% (from 11 nM 
to less than 1 nM). What fraction of the molecules of 
Na*/K'-exchanging ATPase displays epitopes accessible 
to the immunoglobulin? 
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S13 Figure 11-5: Immunoelectron microscopy of the 30S subunit of the ribo- 
some. A drawing of the crystallographic molecular model of the 30S sub- 
unit from Thermus thermophilus is at the top.” The ribosomal RNA and 
unhighlighted polypeptides are drawn in a space-filling representation in 
which a sphere was placed at each acarbon of the proteins and at each 
phosphorus atom of the nucleic acid. The backbones of the polypeptides 
of those subunits that were used as antigens in the various experiments are 
drawn in skeletal representation and identified by the standard number- 
ing. Subunit S13 is located on the surface of the 30S subunit just over the 
top of the view presented. This drawing was produced with MolScript.'® 
The electron micrographs are of immune complexes between polyclonal 
immunoglobulins G and 30S ribosomal subunits from E coli. Purified 
30S subunits were mixed with the polyclonal immunoglobulins raised 
against the particular purified polypeptide: S9, S10, S13, S11, S6, or S8. The 
complexes that formed were adsorbed to a layer of carbon on a grid for 
microscopy, negatively stained with uranyl acetate, and observed in the 
electron microscope. The immunoglobulinsG are Y-shaped proteins 


A 
Wei 


DHS (Figure 11-1) that connect two globular 30S ribosomal subunits or bind to 
SS: ICHS just one. The 30S subunits in the micrographs can be recognized by their 


characteristic shapes as illustrated by the crystallographic molecular 
model. At the top in the view of the crystallographic molecular model pre- 
sented is a smaller globular domain, to the left a significant protrusion, at 
the bottom the larger globular domain, and to the right a deep cleft 
between the upper and lower domains. The top two panels of micro- 
graphs, for polypeptides S9 and S10, are results from the laboratory of 
Stoffler.°° Reprinted with permission from ref 96. Copyright 1975 held by 
the authors. The lower rows of micrographs, for polypeptides S13,” s6,” 
S11,” and $8,” are results from the laboratory of Lake. Reprinted with per- 
mission from refs 97 and 98. Copyright 1975 and 1981 Elsevier B.V. 
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Chapter 12 


Physical Measurements of Structure 


Physical properties used to assess the structure of a pro- 
tein are its standard diffusion constant, its standard sed- 
imentation coefficient, its intrinsic viscosity, and the 
angular dependence of its ability to scatter light, X-radi- 
ation, and neutrons, all of which respond to the shape of 
the macromolecule; its absorption of light, which 
responds to the environments around particular chro- 
mophores, in particular the peptide bonds; its fluores- 
cence or the fluorescence of chromophores covalently 
attached to it, which can be used to measure molecular 
shape and intramolecular dimensions; and its nuclear 
magnetic resonance spectrum, which can be used to 
map spatial relationships among the amino acids in the 
native structure. These physical properties are derived 
from measurements made of solutions of the protein. 
When a crystallographic molecular model is not avail- 
able, such physical measurements provide the only 
structural information about a particular protein. As 
more and more crystallographic molecular models 
become available, however, physical measurements 
have become valuable complements to crystallography. 
Because they are structural measurements of the protein 
in solution, they can be used to validate a crystallo- 
graphic molecular model, or in some situations to adjust 
the crystallographic molecular model to correct for dif- 
ferences between the structure of a molecule of protein 
in a crystal and its structure in solution. 


Shape 


A molecule of protein dissolved in an aqueous solution of 
moderate ionic strength is a compact solid of peculiar 
shape coated with a layer of water that is more or less 
fixed upon its surface and that has the effect of smooth- 
ing its roughness. The available crystallographic molecu- 
lar models of various proteins are similar enough to each 
other to provide an accurate mental picture of the 
boundary between the molecule of protein proper and 
the liquid water of the bulk phase. Crevices on the sur- 
face of a molecule of protein are filled with molecules of 
water. Although these molecules of water are rapidly 
exchanging with their neighbors more peripherally 
located, the locations where they sit are always occupied 
and can be considered to be permanent features of the 
molecule of protein. Their net effect is to fill the crevices 
on the surface of the folded protein. Between these 


crevices, over the open surface of a molecule of protein, 
a large number of molecules of water are situated in loca- 
tions that are also permanently occupied, even though 
constantly exchanging. The relative positions of these 
locations becomes less and less fixed the farther they are 
situated from the atoms of the molecule of protein until 
a region is reached where the water is no different from 
the water in an otherwise identical solution lacking the 
protein. This continuous transition between the mole- 
cules of water fixed to two or three donors and acceptors 
of hydrogen bonds on the surface of a molecule of pro- 
tein and the molecules of water in the bulk solvent is 
characterized by a gradual, rather than an abrupt, 
decrease of attachment. Therefore, no distinct boundary 
exists between the macromolecule and the solvent. 
Nevertheless, the concept of the hydrodynamic particle’ 
is necessary if specific dimensions are to be extracted 
from physical measurements of the shapes of molecules 
of protein dissolved in free solution. 

The hydrodynamic particle is the covalent mole- 
cule of protein and any molecules of water and any 
solutes that behave during the measurement as if they 
were affixed to the molecule of protein. An affixed mole- 
cule of water would be a specific location upon the sur- 
face of the protein continuously occupied by one or 
another molecule of water over a period of time long 
enough that the measurement registers it as a permanent 
feature. 

If it is assumed that a hydrodynamic particle exists, 
its mass m, (in grams) will be 


Mprot (1 Se 51,0) 
m, = a (12-1) 
A 


where Mprot is the molar mass (grams mole?) of the pro- 
tein, dy,9 denotes the grams of water bound for every 
gram of protein (Table 6-4), and N, is Avogadro’s 
number (6.022 x 10° mol’). It should be recalled that 
Mprov the molar mass of the covalent structure of the pro- 
tein, is almost always calculated directly from the amino 
acid sequences and stoichiometries of its constituent 
polypeptides and the amount of any posttranslational 
modifications. 

The volume of the hydrodynamic particle (centi- 
meters’) should be 


574 Physical Measurements of Structure 


M prot 
( = 


= 0 
V, = N, (Pprot + 5140 P°u,0) (12-2) 


where Dprot is the partial specific volume of the protein in 
centimeters? gram” and v? mo is the specific volume of 
pure water in centimeters” gram”. 

If other solutes j are attached to the hydrodynamic 
particle, Equation 12-1 is expanded by adding a set of 
terms ô; each of which is the grams of each solute j for 
every gram of protein, and Equation 12-2 is expanded 
by adding a set of terms At, where the U; are the partial 
specific volumes (centimeters? gram”) of the solutes j. 
An example of bound solutes for which these additional 
terms are major features of these equations is a case in 
which the protein has bound detergents or bound 
lipids.” 

The standard diffusion coefficient of a protein 
(centimeters? second’) is designated as Dies where the 
superscript indicates extrapolation to a zero concentra- 
tion of protein and the subscripts indicate a correction to 
a temperature of 20 °C and to a solvent with the viscosity 
of pure water. The standard diffusion coefficient is a 
measure of f, the frictional coefficient (grams second”) 
of the hydrodynamic particle in water at 20 °C at infinite 
dilution: 


f= (12-3) 
D 20,w 


where kg is Boltzmann’s constant (1.381 x 10° erg K 1 
and Tis the temperature (293.15 K). A standard diffusion 
coefficient and a frictional coefficient are particular and 
intrinsic properties of a given protein in a given solu- 
tion. 

The concept of the hydrodynamic particle quali- 
fies the meaning of the diffusion coefficient presented 
in Chapter 1 and the frictional ratio presented in 
Chapter 6. By use of the equation for the frictional coef- 
ficient of a sphere, a minimum frictional coefficient for 
the hydrodynamic particle at infinite dilution can be 
defined as 


fon = 6rnRon (12-4) 


where the subscript zero refers to the minimization and 
7 is the viscosity (pascal seconds). The viscosity of pure 
water at 20 °C is 1.002 mPa s. The hydrodynamic radius, 
Ron, is defined as the radius (centimeters) of a sphere 
with the same volume as the hydrodynamic particle: 


(12-5) 


Consequently 


3M % 


prot 
(7 prot + mo Y 10) (12-6) 


=6 
Fon ai AnNa 


The hydrodynamic radius, Roy, the radius of a sphere 
with the same volume as the hydrodynamic particle, 
must not be confused with the apparent radius, or Stokes 
radius, of the particle, a, the radius of a sphere with the 
same standard diffusion coefficient as the particle. 

The definition of the minimum frictional coeffi- 
cient for the molecule of protein, that expected of a 
hydrated sphere of the same volume as the hydrated 
molecule of protein, incorporates the water bound to the 
protein rather than treating the protein as ifit were unhy- 
drated as was done earlier (Equation 8-39). If 64,0 is 0.3 g 
(e of protein)", consistent with the values in Table 6-4, 
the hydrated effective sphere should have a volume 1.4 
times as large as the unhydrated effective sphere (if Dprot 
is taken as 0.74 cm? g! and 7° mo as 1.00 cm "el, and the 
frictional coefficient of the hydrated effective sphere of 
protein, fon should be 1.12 times larger than the fric- 
tional coefficient of the unhydrated effective sphere of 
protein, Jun, This is consistent with the fact that the 
smallest frictional ratios, Han, observed for globular 
proteins are always greater than or equal to 1.1 when no 
correction is made for hydration. 

The relationship between the frictional ratio (f/f) 
and the shape of a particle has been derived for ellip- 
soids of revolution, either prolate or oblate.’ The rela- 
tionships can be presented graphically (Figure 12-1A).’ 
After the frictional ratio (f/ fon) for the hydrodynamic par- 
ticle has been calculated from the observed value of the 
frictional coefficient f (Equation 12-3) and the calculated 
value of fo, (Equation 12-6), the apparent axial ratio, 
alb, of the hydrodynamic particle can be read from the 
graph. 

Molecules of protein are neither prolate nor oblate 
ellipsoids of revolution, but exact solutions to the hydro- 
dynamic equations are available only for these shapes. 
Such an approximation, however, may provide some 
insight into the shape of a particular molecule of protein, 
especially when the frictional ratio differs greatly from 1. 
Such a result cannot be explained on the basis of an 
unexpectedly high degree of hydration and states that 
the protein of interest is peculiar in its shape. In the par- 
ticular case where the molecule is thought to resemble a 
cylindrical rod of length L and diameter d, it has been 
concluded that the dimensions of that cylindrical rod can 
be calculated by using the frictional ratio to determine 
the axial ratio of an equivalent prolate ellipsoid, a/b, and 
then applying the formula 


L 3\2a 
SL = 12-7 
GI? 12-7) 

The segment of rope formed by triple-helical colla- 

gen type (Figure 9-33), usually referred to as a protofil- 


Prolate 


Oblate 


Frictional ratio (f/f) 
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Figure 12-1: Graphic relationships! between the axial ratio (a/b) of an oblate ellipsoid of revolution or a prolate ellipsoid of revolution and 
either (A) the frictional ratio (Up) or (B) the Simha factor (v). A prolate ellipsoid of revolution is generated by rotation around the major axis 
of an ellipse; an oblate ellipsoid of revolution, by rotation around the minor axis. The relationships for smaller values of the axial ratio are 
given directly. For large values of the axial ratio (>10) the logarithm of the frictional ratio (C) or the logarithm of the Simha factor (D) is given 
as a function of the logarithm of the axial ratio. The frictional ratio or the Simha factor, determined experimentally, can be converted into an 
axial ratio with the appropriate graph. When the axial ratios are greater than 100, each of the four curves in panels C and D becomes a straight 
line to infinity. As a result, values of the frictional ratios or the Simha factors that are greater than those on the graphs can still be converted 
to values for axial ratios with the use of the slopes of these lines for extrapolation. The slope of the line in panel C for logarithms of the fric- 
tional ratios of prolate ellipsoids is 0.47; that for oblate ellipsoids, 0.33; the slope of the line in panel D for logarithms of the Simha factors for 
prolate ellipsoids is 1.81; and that for oblate ellipsoids, 1.00. Reprinted with permission from ref 1. Copyright 1961 John Wiley. 


ament, is known to be a rod. The molar mass of a triple- 
helical rope of collagen type I is 281,000 g mol”, its par- 
tial specific volume! is 0.695 cm? e", and its standard 
diffusion coefficient" is 0.85 x 107 cm? s. If it is assumed 
that 64,9=0.3 g g |, then the frictional ratio for the hydro- 
dynamic particle, f/ fọn would be 5.3, for which the ratio 
of L/d would be 210 (Figure 12-1C and Equation 12-7). 
The volume of the hydrodynamic particle containing a 


segment of rope of collagen type I, based on the assump- 
tion that 5y,9 = 0.3, would be 460 nm? (Equation 12-2), 
which would fill a cylinder 285 nm long with a diameter 
of 1.35nm. From the dimensions of the triple helix 
(Figure 9-33) and the length of the polypeptide, a mole- 
cule of unpolymerized collagen type I is thought to be 
300 nm long. 

It is also possible to calculate the frictional coeffi- 
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cient for a string of spherical beads’ in various geomet- 
ric arrangements. Fibronectin in electron micrographs 
appears as a flexible segment of rope 130 nm in length. A 
string of spherical beads of that length and with a total 
volume equal to that of a molecule of fibronectin has a 
frictional coefficient equal to that calculated from its 
standard diffusion coefficient.’ Before the length of a 
molecule of caldesmon had been established by electron 
microscopy, it was calculated that a string of spherical 
beads 74 nm in length with a total volume equal to that 
of a molecule of the protein would have a frictional coef- 
ficient equal to that observed for a molecule of the pro- 
tein.® It was later shown that the length of the rope-like 
molecule of caldesmon seen in electron micrographs of 
the protein is about 70 nm.’ A string of 12 small spherical 
beads (r = 0.8 nm) attached to a single larger spherical 
bead (r=3 nm) produced a structure that resembles elec- 
tron micrographs of vinculin and has a frictional coeffi- 
cient equal to that calculated from the standard diffusion 
coefficient observed for the protein.® 

The frictional coefficient of a molecule of protein 
can also be determined by sedimentation velocity. 
Consider a hydrodynamic particle dissolved in aqueous 
solution that is submitted to a high centrifugal force in 
the rotor of an ultracentrifuge. The centrifugal force on 
the particle is equal to mort, where œ is the angular 
velocity (radians second") of the rotor, mp is the mass 
(grams) of the hydrodynamic particle, and r is the dis- 
tance (centimeters) the particle is from the axis of the 
rotor. This centrifugal force is countered by the buoyant 
force, which is equal to ViPsol®T, where p;o is the density 
(grams centimeter”) of the solution displaced by Vj, the 
volume (centimeters?) of the hydrodynamic particle. The 
net force on the particle is 


F = @°r (m, - Vi Poot) (12-8) 


Because the measurements are extrapolated to a solution 
of protein at a concentration of zero in only water, it can 
be assumed that p,.ı is insignificantly different from po, 
the density of water at the appropriate temperature. 
Because pọ is the reciprocal of Defi when expressions 
for m, (Equation 12-1) and V, (Equation 12-2) are sub- 
stituted, the net force is 


M 


prot 


F= °F (1 — Dyro Po) (12-9) 


A 


The term (Mprot/ Na) (1 — Deco) is the buoyant mass of the 
hydrodynamic particle of the protein. 

The net force (Equation 12-9) causes the hydrody- 
namic particle to accelerate. As it accelerates, the fric- 
tional force, which is equal to fu, where fis the frictional 
coefficient (grams second”) and u is the velocity (cen- 
timeters second) of the hydrodynamic particle, 


increases in direct proportion to the velocity of the parti- 
cle until it just balances the net centrifugal force. At that 
point, a steady state is achieved, the forces on the hydro- 
dynamic particle are equal and opposite, the particle 
travels in the direction of the centrifugal force at its ter- 
minal velocity, and 


t u 
= N, air) — Be Po) (12-10) 
This equation can be rearranged to give 
u M rot (1 T Uprot Po) 
s= = (12-11) 


or IN, 


The term on the left, uo”r"', which is the observed veloc- 
ity normalized for all of the parameters of the instru- 
ment, can be directly measured, and it is referred to as 
the sedimentation coefficient, s. The standard sedimen- 
tation coefficient (seconds) for the hydrodynamic parti- 
cle is designated as ow where superscript and 
subscript have the same meaning as before. Because sed- 
imentation coefficients of proteins are between 10™ and 
10™ s, the unit 107! s is designated as S, the Svedberg. 

Because it is only a function of universal constants 
and the properties of the molecule of protein itself, the 
standard sedimentation coefficient is also an intrinsic 
property of the protein. In particular it is, as is the stan- 
dard diffusion coefficient, a direct measurement of the 
frictional coefficient 


M 1 — Dot Po 
f = prot ( prot ) (12-12) 


0 
S 20,w Na 


To use Equation 12-12, the molar mass of the pro- 
tein must be a fixed and known quantity. If the protein is 
normally engaged in a reaction that changes its molar 
mass, such as the equilibrium between the dimers and 
the tetramer of hemoglobin, a molar mass cannot be 
assigned. If the protein is participating in such a reaction, 
abnormally large decreases in the sedimentation coeffi- 
cient will occur as the concentration of the protein is 
decreased upon extrapolation to the zero concentration 
of protein. 

Aspartate carbamoyltransferase from Escherichia 
coli (Figure 9-37) serves as an example of the application 
of analysis by sedimentation to the study of a hydro- 
dynamic particle of known shape. It is a protein of 
molar mass 308,100 g mol, and its standard sedi- 
mentation coefficient Siw is 11.6 S,”'° from which a 
frictional coefficient f of 11.6 x 10° g s™ can be calcu- 
lated. The minimum frictional coefficient, foun, for 
the unhydrated protein (ën = 0), folded as a sphere, 
would be 8.5 x 10° g s™ (Equation 12-6). If hydration of 


0.3 gg is assumed, this would give a frictional ratio f/f) 
of 1.22. Although the protein (Figure 9-37) does not have 
the axial ratio suggested by the frictional ratio (a/b would 
be 5 were the protein an oblate ellipsoid), the value of 
1.22 probably results from both the irregular shape of the 
molecule and an abnormally large amount of bound 
water within the central cavity between the two C sub- 
units. 

It has been demonstrated crystallographically'' that 
when aspartate carbamoyltransferase binds the enzy- 
matic inhibitor N-(phosphonacetyl)-L-aspartate, the pro- 
tein undergoes a conformational change that alters the 
disposition of its subunits significantly. A conforma- 
tional change in a protein is any change in its structure 
brought about by a change in the solution, for example, 
the addition of an inhibitor. The net effect of this partic- 
ular conformational change in aspartate carbamoyl- 
transferase is to move the two trimers of catalytic 
æ subunits (Figure 9-37) 1.2 nm farther apart. In the 
process, the water-filled space between the two C sub- 
units widens by the same amount. This change in struc- 
ture caused by the binding of the inhibitor can be 
detected as a change in the sedimentation coefficient of 
aspartate carbamoyltransferase.”'° This change is accu- 
rately quantified by difference sedimentation analysis in 
which the two samples, with and without the inhibitor, 
are simultaneously monitored in separate cells in the 
same rotor.'” The sedimentation coefficient” decreases 
by 3.4% upon the change in structure. The diameter of the 
space between two trimers of catalytic o subunits of 
aspartate carbamoyltransferase is about 8 om. so the 
increase in the amount of water between them resulting 
from a movement apart of 1.2nm should be about 
40,000 e mof. This change alone should increase the fric- 
tional coefficient (Equation 12-6) by 3-4%, which would 
account completely for the observed change of 3.4%. 

Both the high frictional ratio for unexpanded aspar- 
tate carbamoyltransferase and the increase in hydration 
experienced upon its expansion illustrate the fact that 
oligomeric proteins, because of the spaces among the 
subunits, display greater hydration than monomeric 
proteins. From results of measurements of the scattering 
of X-radiation at small angles by solutions of proteins, it 
has been calculated that while monomeric proteins have 
values of Au o of 0.25-0.35 g g’, oligomeric proteins have 
significantly higher values for äu o of 0.35-0.7 g go ie 
the hydration of aspartate carbamoyltransferase in its 
unexpanded state were 0.6 gg'', its frictional ratio would 
be only 1.13, a value that is easily accounted for by its 
irregular shape. 

Desmin is one of the proteins that forms intermedi- 
ate filaments (Figures 9-35 and 9-36). The monomeric 
unit of the polymer is an o: dimer of two identical 
polypeptides; those from chicken are 463 aa long. The 
core of the dimer is a coiled coil of two a helices, one 
from each polypeptide. This coiled coil is contained 
within a fragment of the dimer containing the polypep- 
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tides from Glycine 70 to Phenylalanine 415 that can be 
produced by digestion with chymotrypsin. The standard 
sedimentation coefficient of this dimeric fragment is 
2.85 S, while the standard sedimentation coefficient of 
an (&), dimer of this dimer is 3.85 S.” If it is assumed 
that both of these oligomers, the dimer and the dimer of 
dimers, can be represented as cylindrical rods, L/d for 
the a dimer is 28 and that for the (a). dimer of dimers is 
44. Consequently, the (a). dimer of dimers is 1.7 times 
longer than one o dimer. If this approximation is realis- 
tic, the coiled coils of the two dimers must be staggered 
by about 0.7 of their length in the dimer of dimers. 

The standard diffusion coefficient and the standard 
sedimentation coefficient provide independent deter- 
minations of the frictional coefficient of a molecule of 
protein. The force producing net flux of protein when dif- 
fusion is measured is chemical potential, which is unre- 
lated to, as well as being somewhat less concrete of a 
concept than, centrifugal force. Furthermore, the theo- 
retical derivations of the relationship between the diffu- 
sion coefficient and the frictional coefficient and that 
between the sedimentation coefficient and the frictional 
coefficient are entirely different. It is of interest to com- 
pare (Table 12-1) the two frictional coefficients, that cal- 
culated from diffusion (el and that calculated from 
sedimentation (Ga), Depending upon one’s prejudice, 
the agreement between the numbers is either as one 
expected or quite gratifying. The lack of any systematic 
deviation verifies the assumption, first made by Einstein, 
that the same frictional coefficient applies to both diffu- 
sion and sedimentation. 

The frictional ratios (fav/ fon), where fav is the average 
of the two measurements, are close to 1 (1.1-1.2) for most 
globular proteins (Table 12-1), but even in these 
instances the frictional ratios of the hydrated particles 
predict (Figure 12-1A) an axial ratio of greater than 3, 
which is unrealistic. These values of 1.1-1.2 do not indi- 
cate elongation but reflect the fact that molecules of pro- 
tein, even when they are almost spherical, are not 
smooth spheres but globular macromolecules with 
irregular, rough surfaces. Proteins such as fibrinogen, 
apolipoprotein(a), and plasminogen, however, which are 
known from other observations to be highly asymmetric, 
have much higher frictional ratios. 

The specific examples of elongated or excessively 
hydrated proteins described in detail so far, collagen, 
fibronectin, caldesmon, vinculin, aspartate carbamoyl- 
transferase, and desmin, are macromolecules about 
which enough is known that the hydrodynamic meas- 
urements can be evaluated comprehensibly. When no 
other structural information is available about a protein, 
a frictional ratio around 1.1 is strong evidence that it is 
globular, but frictional ratios greater than 1.1 are difficult 
to interpret. For example, the fact that human bifunc- 
tional polynucleotide 3’-phosphatase/5’-kinase has a 
frictional ratio (in: Ou,0 = 0.3) of 1.30'° could result 
from an unusually irregular shape, an elongated shape, 
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Table 12-1: Frictional Coefficients from Sedimentation and Diffusion 


protein species be d 20, E Doo, W > fread” Fair a f Di fo d fav! fon’ 
(cm? g") (sx 10°) (cm? s™x 10° (g mol ) (gst x 10% (gs!x 10%) (gs! x 10%) (gs'x 10%) 

lysozyme chicken 0.703 1.91 11.20 14,310 3.7 3.6 2.99 3.4 1.09 

alcohol horse 0.750 5.0 6.2 79,600" 6.6 6.5 5.41 6.1 1.09 
dehydrogenase i 

catalase cow 0.730 11.30 4.10 232,800' 9.3 9.9 7.67 8.6 1.11 

B-galactosidase E. coli 0.76 15.93 3.12 465,400 11.7 13.0 9.79 10.9 1.13 

serum albumin human 0.735 4.64 6.0 66,470 6.3 6.7 5.06 5.7 1.15 

fructose-bisphosphate rabbit 0.742 7.35 4.63 156,840 9.2 8.7 6.76 7.6 1.18 
aldolase i 

prothrombin cow 0.70 4.85 6.24 72,600 7.5 6.5 5.13 5.8 1.21 

manganese-stabilizing spinach 0.732 2.26 7.6 26,530 5.3 5.3 3.72 4.2 1.27 
protein“ 

plasminogen human 0.71 4.30 4.31 103,000! 11.6 9.4 5.8 6.5 1.61 

apolipoprotein(a)” human 0.69 9.30 2.29 323,000” 18.0 17.7 8.4 9.5 1.88 

fibrinogen human 0.725 7.63 1.98 344,000° 20.7 20.4 8.7 9.8 2.10 


“Unless otherwise noted, these values are from tables in ref 16. The entries are arranged in order of asymmetry. "From sequence. ‘feq from Equation 12-12. fur from. 
Equation 12-3. °fo, unh from Equation 12-6 with ën o equal to 0. fon from Equation 12-6 with ën o equal to 0.3. Average of La and Lon divided by bn, "2 Zn* subunit. 


‘One heme subunit './10.4 g of oligosaccharide (100 g of protein)”. 
of protein)". °2 g of oligosaccharide (100 g of protein)", 


or an abnormally large amount of bound water or, most 
likely, some combination of all of these factors. 

Measurement of the viscosity of a solution of a 
protein also provides an evaluation of the shape of the 
hydrodynamic particle. When a fluid flows through a 
cylindrical capillary under the appropriate circum- 
stances, laminar flow occurs. The fluid immediately 
adjacent to the walls of the capillary is stationary, and 
the fluid at the center of the capillary has the highest 
rate of flow. Each cylindrical lamina between the center 
and the wall moves with an intermediate velocity that, 
as the distance from the center increases, monotoni- 
cally decreases to a value of zero at the wall. Laminar 
flow requires that each cylindrical lamina move more 
slowly than its neighbor toward the center. As such, 
shear occurs between adjacent lamina throughout the 
capillary. The surfaces at which shear occurs are all par- 
allel to the axis of the capillary. The more viscous the 
fluid, the more difficult it will be for these surfaces of 
shear to move across one another, and the more slowly 
the fluid will flow through the capillary. The time 
required for a given volume of a fluid to move through a 
given capillary at a given hydrostatic pressure is directly 
proportional to 7, the viscosity (pascal seconds) of the 
fluid. 

The addition of macromolecules such as proteins to 
the fluid in the capillary interrupts the shear that other- 
wise would occur in the solution in their vicinity and 
increases the viscosity of the solution. This increase can 
be expressed in terms of the specific viscosity, Nsp, which 
is defined by 


(12-13) 


“Reference 17.'17 g of oligosaccharide (100 g of protein)". "Reference 18. "30 g of oligosaccharide (100 g 


where 77’ is the viscosity of the solution containing a par- 
ticular concentration of the protein and 77 is the viscosity 
of an otherwise identical solution lacking the protein. 
The specific viscosity is a positive number because n’ is 
always greater than 7. The specific viscosity is the nor- 
malized incremental increase in the viscosity caused by 
the protein. 

If the flow through the capillary is driven only by the 
weight of the fluid, the specific viscosity is readily meas- 
ured because 

n tp’ 
nm (12-14) 


where fis the time for a given volume of a solution to flow 
through the capillary, p is the density of the solution, and 
the primed and unprimed terms refer to the solution of 
protein and an identical solution lacking the protein, 
respectively. 

The specific viscosity, Nsp, is the fractional increase 
in the viscosity of the solution due to addition of the pro- 
tein, and it increases monotonically as the concentration 
of protein is increased. To render this increase an intrin- 
sic property of the protein, regardless of its concentra- 
tion, the intrinsic viscosity, [n] (centimeters? gram’), is 
defined as 


(12-15) 


where e is the concentration of protein in grams cen- 
timeter °. At low concentrations of protein, Nsp should be 
directly proportional to %rop and [n] is simply the slope of 


the line of nsp plotted against %roı- Neither the specific 
viscosity nor the intrinsic viscosity is itself a viscosity. 
The intrinsic viscosity is sometimes called the limiting 
viscosity number to avoid this confusion. 

For macromolecules such as proteins, it can be 
shown! that 


Vha N 
im = v= (12-16) 
M prot 


where v is a dimensionless coefficient of proportionality 
referred to as the Simha factor. On the basis of Einstein’s 
calculations, the value of v for a spherical hydrodynamic 
particle is 2.5. As with the frictional ratio, f/f, the rela- 
tionship between the Simha factor v and shape has been 
derived for ellipsoids of revolution.’ The relationships 
can be presented graphically (Figure 12-1B). Ifa value for 
6,0 is assumed, v can be calculated from 


v= al (12-18) 


5 0 
Uprot + 94,0 Y H,O 


and the apparent value of the axial ratio can be read from 
the graph. 

From Equation 12-17, if 64,5 = 0.3 g e, prot = 
0.74cm’ g', and v'ho = lcm’ g™, [ņ] would be 
2.6 cm? g" if the hydrodynamic particle were a sphere 
regardless of its molar mass. What this means is that as 
long as 64,0 and Vpro do not vary significantly, the vis- 
cosities observed for a set of solutions, each of a different 
spherical molecule of protein and each at the same con- 
centration in grams centimeter’, will be the same 
regardless of whether the mass is distributed among only 
a few large spheres because the protein has a large molar 
mass or is distributed among many small spheres 
because the protein has a small molar mass. Most globu- 
lar proteins do have intrinsic viscosities between 3.0 and 
4.0 cm? g” regardless of their molar masses. An observa- 
tion of an intrinsic viscosity in this range demonstrates 
that a protein is globular. 

Intrinsic viscosity is dramatically more sensitive to 
the asymmetry of a molecule of protein than is the fric- 
tional coefficient. The frictional coefficient of a molecule 
of collagen typeI is only 5 times larger than the fric- 
tional coefficient it would have if it were a hydrated 
sphere, but the intrinsic viscosity of a solution of colla- 
gen type I is 460 times larger than the intrinsic viscosity 
it would have if it were a hydrated sphere. The intrinsic 
viscosity of collagen typeI is 1150 cm? g". If it is 
assumed that Au = 0.3 g g’, the Simha factor is 1160 
(Equation 12-18), and if it is assumed that the molecule 
is cylindrical, the axial ratio (a/b) of the hydrodynamic 
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particle would be 140 (Figure 12-1D) and the ratio of 
cylindrical length to diameter (L/d) would be 170 
(Equation 12-7). This ratio is that of a cylinder of length 
260 nm with a diameter of 1.5nm and a volume of 
460 nm?. As noted before, collagen type I is 300 nm in 
length. 

Another procedure that can provide information 
about the shape of a molecule of protein is the scattering 
of electromagnetic radiation or neutrons. In the earlier 
discussion of light scattering, it was mentioned that the 
intensity of the scattered light from a solution of protein 
can depend on the angle at which the measurement is 
made. This is due to the fact that if the molecule of pro- 
tein has at least one dimension that is an appreciable 
fraction of the wavelength of the light, photons scattered 
from different points in the same molecule of protein will 
be out of phase, and intramolecular interference due to 
these mismatched phases will diminish the overall inten- 
sity of the scattered light. This interference increases as 
the angle at which the scattered light is measured, the 
scattering angle 6 (Figure 8—4), is increased. At a scatter- 
ing angle of 0, the angle of the forward scattering io, 
there is no interference. It is the forward scattering that 
contains information about the molar concentration of 
particles in the solution, and hence the molar mass of 
those particles. 

It can be shown that, when the contribution of the 
virial coefficients to the scattering is eliminated by 
extrapolating the measurements to zero concentration of 
protein 


. K Yprot on V 1 
lim = 
d Y prot T,P u M prot 


1 


P(6) 
(12-19) 


where K is the optical constant (moles centimeter ^) 
defined by Equation 8-28, R, is the Rayleigh ratio (cen- 
timeters') calculated from the measurements by 
Equations 8-30 or 8-31, %rot is the concentration of pro- 
tein in the units of grams centimeter ®, (3ñ/ deel Py is the 
change (centimeters? gram") in the refractive index of 
the solution as a function of the concentration of the pro- 
tein, and M,,,; is the molar mass (grams mole”) of the 
protein. As noted previously, the incremental scattering 
igis the scattering that results only from the molecules of 
protein and is measured as the difference in scattering 
between the solution containing protein and an identical 
solution not containing protein. 

The function P(6) is the factor by which the inten- 
sity of the light scattered only by the protein, the incre- 
mental scattered light (i), is decreased as a result of the 
interference:”! 


167? R 
P()=1- en. 
31? 2 


(12-20) 
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where Rg is the radius of gyration (centimeters) of the 
molecule of protein and @is the angle relative to the inci- 
dent radiation at which the scattered radiation is meas- 
ured. The value for the wavelength of the light, A, is its 
wavelength in the solution 


(12-21) 


where Aj is its wavelength in a vacuum. Equation 12-20 is 
an infinite series, but at small values of @ the higher terms 
become negligible and the approximation 


1 1 l6z° RF 4 
lim = aa =1+ sin 
gu P(@) 16x°Re o 32? 2 
] — ——— sin” — 
31? 
(12-22) 


can be used. In practice 


$ Yprot Y prot 16r’ RÈ 2 [2] 
lim = 1+ sin 
gn R Ro 32 2 


(12-23) 


A plot of the left-hand quotient, extrapolated to zero con- 
centration of protein at each value of 6, against sin? (9/2), 
for small values of 6, will be a straight line from the slope 
and intercept of which a value of Rg, the radius of gyra- 
tion, can be calculated. Equation 12-19 emphasizes that 
the interference arising from the shape of the molecule of 
protein is independent from the colligative property of 
light scattering from which its molar mass can be esti- 
mated. 

The radius of gyration of the molecule of protein is 
the molecular parameter that is obtained from the angu- 
lar dependence of the intensity of the scattered radiation 
and that provides information about the shape of the 
molecule of protein. The radius of gyration of a solid of 
uniform scattering density, as is usually assumed to be 
the case for proteins, is defined by the relationship 


| r?°dV 
vol 


E 
vol 


where r is the distance of a volume element dV from the 
center of mass. The integration is performed over the 
whole volume of the solid. The advantage of the radius of 
gyration is that it can be calculated by numerical inte- 
gration for any structure, for example, a crystallographic 
molecular model, and compared to the value obtained 
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from the measurement. For example, in an elongated 
protein such as fibronectin, which is known to be con- 
structed from internally repeating domains, radii of gyra- 
tion can be calculated for various rigid structures built 
from a string of spheres each representing one of the 
individual domains of the molecule, and these calculated 
values can be compared to the observed value for the 
radius of gyration estimated by light scattering.” 

The radius of gyration for a single sphere of uniform 
density is 


Ya 
Ra = E CH (12-25) 


where Ra is the radius of the sphere. The radius of gyra- 
tion for a cylindrical rod is 


L 


r 


= E (12-26) 


Ro 


where L, is the length of the rod. The radius of gyration 
for a prolate ellipsoid” is 


2, p2\% 
Re = (a (12-27) 
5 
where a and b are the semi-major and semi-minor axes, 


respectively. 

The effect of the finite size of the molecule of pro- 
tein on the scattered light is that its intensity, as reflected 
in Rọ (Equation 8-30 or 8-31), decreases as 0 increases, 
owing to intramolecular interference, but its intensity 
will decrease significantly only if the term [167 Rg 
sin?(6/2)]/3A’ in Equation 12-23 is large enough to cause 
a measurable effect. In practice,” this means that at least 
one dimension of the protein must be greater than 4/20. 
The sizes of most molecules of protein are too small for 
this to be the case when visible light (A = 300-500 nm in 
water) is used as the radiation. For example, the decrease 
in light scattering from a solution of fibronectin meas- 
ured at a wavelength of 436 nm (in vacuo) was only about 
10% at the maximum possible sin’(6/2) of 0, even 
though fibronectin has a molar mass of 519,000 g mol’, 
is a string of domains with a total length of 180 nm, and 
has a radius of gyration of 8.6 nm.” 

For most proteins, significant decreases in angular 
light scattering are observed only when X-radiation is 
used (A = 0.1-0.2 nm). Unfortunately, this is radiation of 
such short wavelength that complete intramolecular 
interference occurs at quite small values of 6 (Equation 
12-20), and the scattered radiation from a solution of 
protein becomes equal to that from the solution lacking 
protein when @is only 1-2°. Fortunately, accurate meas- 
urements of scattered X-radiation can be made at the 
necessary small angles. The values for R, obtained from 


small-angle X-ray scattering, for example, 1.75 nm for 
myoglobin,” 2.3 nm for cyclic AMP-dependent protein 
kinase, and 1.36nm for reduced cytochrome c,” 
demonstrate the ability of this technique to provide 
information about small globular proteins. 

When the observations of X-ray scattering are pre- 
sented, a different convention is used to approximate 
P(0). Because 

a 
exp (-x) = ee +... 


(12-28) 
the first two terms in Equation 12-20 are identical to the 
first two terms in the expansion of exp[(16F Rg” 
sin? (6/2)/3A°], and at small angles”'’* 


167? RÈ 
lim In P(@) = - G sin? ? 
0—0 312 


(12-29) 


Because at such small angles none of the terms in 
Equation 12-19 except ig (see Equations 8-30 and 8-31) 
and P(6) change as @is varied, a plot of In ig as a function 
of sin? (9/2) at the smallest angles will give a straight line 
with a slope of -1627R,7/(3A°).* From this slope Rg is 
readily determined. 

In reports of studies of X-ray scattering, there are 
several ways in which the observations are analyzed. 
First, the data can be presented directly (Figure 12-94)" 
as the natural logarithm of the observed incremental 
scattering intensity (In ig) as a function of q, where 


Ar sin (0/2) 


12-30 
7 ( ) 


Q 
Ill 


The advantage of this presentation is that the scattering 
calculated from a particular model of the molecule of 
protein as a function of g can be compared to the com- 
plete set of scattering data. Second, the data can be pre- 
sented in a Guinier plot as In i, as a function of a" (Figure 


* Unfortunately, investigators who study the scattering of X-radia- 
tion and neutrons use a different convention for the scattering 
angle (Figure 8-4) from those who study the scattering of light. The 
same scattering angle routinely designated as 0 during measure- 
ments of light scattering is routinely designated as 20 during 
measurements of X-ray scattering and neutron scattering. 
Consequently, the angles 6 from measurements of X-ray scattering 
and neutron scattering must be multiplied by a factor of 2 before 
they are used as angles gin the equations presented in this text. The 
term sin? (0/2) used when results of light scattering are presented 
thus becomes sin? @ when results of small-angle X-ray scattering 
and neutron scattering are presented, and values of sin? @ from 
X-ray scattering and neutron scattering are equivalent to values of 
sin’ (6/2) in the equations used in this text. In measurements of 
X-ray scattering and neutron scattering, the slope of the line when 
In ig is plotted as a function of the sin? @ used by these investigators 
has the value -167° Rg°/32°. 
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12-2B). The advantage of this presentation is that a 
radius of gyration can be estimated from the limiting 
slope at the smallest values of scattering angle 6, typically 
those less than 1° (Figure 12-2A). Third, the distance dis- 
tribution function p(r), which is the Fourier transform of 
the scattering function 


p(r) = +Í ig q rsin (qr) dq (12-31) 


can be calculated. The distance distribution function p(r) 
is the frequency with which vectors of a length r connect 
two volume elements within the molecule of protein 
(Figure 12-20). In practice, the inverse Fourier transform 
of Equation 12-31 


~ (12-32) 


where dmax is the maximum dimension of the particle, is 
used to compute p(r) in reverse.” 

In a plot of the distance distribution function p(r) 
against r, the longest dimension of the molecule of pro- 
tein is the intercept of the function with the abscissa. For 
example, the longest dimension of a molecule of cyclic 
AMP-dependent protein kinase is 7.2 nm (Figure 12-20). 
From scattering curves of myoglobin in its native state 
(Figure 4-18), after the removal of its heme, and then in 
solution at pH 2, the gradual expansion of the protein as 
its structure was disrupted could be followed by moni- 
toring the gradual increase in the value of this intercept.” 

The shape of the distance distribution function pro- 
vides information about the shape of the molecule of 
protein.” If the molecule of protein is globular, the dis- 
tance distribution function p(r) has a fairly symmetric 
shape with a single maximum (Figure 12-20). If the mol- 
ecule of protein has an elongated structure, the distance 
distribution function will be skewed. If it is elongated in 
only one dimension so that it is prolate in shape, the 
maximum will be shifted to short distances because 
there are more short vectors in a prolate solid than there 
are long vectors.” There is a slight indication of such an 
elongation in Figure 12-2C. If the molecule of protein 
contains two well-separated globular domains, there will 
be two maxima in the distance distribution function, the 
one at shorter distances for vectors confined within each 
domain and the one at longer distances for vectors 
between domains. 

A distinction can be made between small-angle 
scattering and solution scattering. Measurements of 
small-angle scattering are confined to the region of the 
scattering function for angles that are only large enough 
to define accurately the distance distribution function 
p(r). This range of small angles also includes scattering at 
the smallest angles, which provides an estimate of the 
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Figure 12-2: Scattering of X-radiation of wavelength 0.154 nm by 
a solution of cyclic AMP-dependent protein kinase at a concentra- 
tion of 66 uM.” (A) Solution scattering curve. The natural loga- 
rithm of the incremental scattering intensity (In ig) of the entire set 
of data, which includes intensities to 1% of the maximum scatter- 
ing, is plotted as a function of q (Equations 8-30, 8-31, 12-19, 12-29 
and 12-30). The set of data includes scattering angles 0 to 3.5°, the 
largest angle at which incremental scattering intensity could be 
measured. (B) Small-angle X-ray scattering curve. In a Guinier plot, 
the natural logarithm of the incremental scattering intensity is 
plotted as a function of q’ at scattering angles less than 1° (q < 0.7) 
for all of the data from the same set as in panel A. The limiting slope 
(the line drawn in the figure) provides the radius of gyration 
(2.31 nm) of the unliganded cyclic AMP-dependent protein kinase. 
(C) Distance distribution function. The Fourier transform 
(Equation 12-31) of the scattering profile in panel A is the distance 
distribution function p(r), the frequency with which two volume 
elements within the actual structure of cyclic AMP-dependent pro- 
tein kinase in solution are a distance r from each other. It provides 
an estimate of the longest dimension (7.2 nm) of the molecule of 
protein. The crystallographic molecular model of cyclic AMP- 
dependent protein kinase has a peptide bound in the active site. A 
molecular model of the empty protein was constructed in which 
the two structural domains enclosing the peptide were opened by 
39° relative to their orientation in the crystallographic molecular 
model. Each carbon, oxygen, and nitrogen in this hypothetical 
model was converted into an equivalent sphere of scattering den- 
sity. The interference expected from this arrangement of spheres as 
a function of scattering angle is the curve drawn through the com- 
plete set of data in panel A, and the distance distribution function 
p(r) calculated from that theoretical scattering curve in panel A is 
the curve drawn through the data in panel C. Reprinted with per- 
mission from ref 26. Copyright 1993 American Chemical Society. 


radius of gyration (Figure 12-2B; Equations 12-19 and 
12-29) and an estimate of the forward scattering i) and 
hence the molar mass of the protein (Equation 12-19). 
Solution scattering includes the data at larger scattering 
angles, where more features are observed (Figure 12-2A), 
such as subsidiary peaks in the scattering function. In the 
region of small-angle scattering, the connection between 
measurements of scattering and hydrodynamic meas- 
urements is most apparent. In the region of solution 
scattering, information about the internal structure of 
the protein is revealed. 

The complete solution scattering curves for differ- 
ent proteins are distinct (Figure 12-3),*° and these dis- 
tinctions indicate that each solution scattering curve 
contains information about the structure of the molecule 
of protein beyond just its radius of gyration and distance 
distribution function. There are several methods for 
extracting this information.’ Small uniform spheres can 
be arranged to produce a hypothetical structure thought 
to represent the structure of the molecule of protein, and 
a theoretical solution scattering curve for this arrange- 
ment can be calculated and compared to the observed 
solution scattering curve.” The spheres can be the indi- 
vidual atoms in a crystallographic molecular model,**** 
and it is possible to calculate the theoretical solution 
scattering curve that a particular crystallographic 
molecular model should produce.” In order to duplicate 
the observed solution scattering curve with the theoreti- 


log ig 
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Figure 12-3: Solution scattering curves for, in descending order, 
myoglobin (my), troponin C in the presence of Ca”* (tc-Ca”’), tro- 
ponin C in the absence of Ca” (tc), a tandem pair of fibronectin 
type 3 domains from ß4 integrin (fn,), spermadhesin PSP-I/PSP-II 
(sad), chymotrypsinogen A (ch), the domain C-lytA from pneumo- 
coccal autolysin (ly), superoxide dismutase (sd), ovalbumin (ov), 
tubulin (tb), nitrite reductase (NO-forming) (nr), and catalase 
(eat ZÜ The observed curves of scattering density were normalized 
by dividing each along its length by the intensity of the forward 
scattering i), and the logarithms of these normalized profiles are 
presented, displaced by one logarithmic unit from each other. For 
example, log iy for the scattering curve of myoglobin was arbitrar- 
ily designated as 15; that for ovalbumin, 7; and that for catalase, 4. 
The data are the irregular curves. The smooth curves drawn 
through the data are theoretical scattering curves of interference as 
a function of g calculated from respective arrangements of sets of 
spheres of 0.3 nm radius, each arrangement with the same total 
volume as the respective molecule of protein. The spheres were 
systematically rearranged until their arrangement reproduced the 
experimental curve as closely as possible. The final arrangement of 
spheres in each case ended up resembling in its shape the crystal- 
lographic molecular model of the respective protein. Reprinted 
with permission from ref 30. Copyright 2000 Elsevier B.V. 


cal scattering curve, however, it is necessary to include a 
layer of hydration around the molecule of protein that is 
about 0.3 nm thick.” The density of the water in this 
layer has a density 1.05-1.20 times that of the bulk 
water.” It is also possible to represent the domains of a 
molecule of protein as ellipsoids of the appropriate 
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dimensions in a particular arrangement and calculate 
the scattering curve of this representation (Figure 12-2A, 
solid line).”° 

A rough estimate of the shape of a molecule of pro- 
tein of unknown structure can be derived from the com- 
plete scattering curve alone with no preconceptions. An 
envelope can generated by spherical harmonics. By sys- 
tematically adjusting the parameters of the spherical 
harmonics, the scattering curve calculated from the 
envelope can be made to fit to the experimental scatter- 
ing curve of the protein.” The resulting envelope will 
have two to four ellipsoidal protrusions of size, shape, 
and orientation determined by the fit to the data. It is also 
possible to rearrange systematically a set of uniform 
spherical beads until the structure they form produces a 
scattering curve matching the one observed (solid lines 
in Figure 123, 

The ideal wavelength for studying the shape of most 
proteins by scattering would be somewhere between 1 
and 10nm. Unfortunately, light of these wavelengths 
cannot be readily generated. Neutrons produced in 
nuclear reactors, however, have velocities high enough 
that a beam with a wavelength of around 1 nm can be 
produced.*’** This wavelength permits measurements of 
neutron scattering of solutions of proteins to be made to 
an angle 0 of 20-40° before interference becomes too 
large. Measurements of neutron scattering from a solu- 
tion of protein as a function of scattering angle are tradi- 
tionally treated just as are those from X-ray scattering 
(Figures 12-2A and 12-3) even though observations can 
be made to much greater angles. In the range of small 
angles, Guinier plots provide radii of gyration and dis- 
tance distribution functions, and complete solution scat- 
tering curves are fit with molecular models of the 
protein.?7°%*0 

Neutrons are scattered by atomic nuclei and each 
isotope of each element scatters neutrons with a charac- 
teristic efficiency, quantified in its scattering length. The 
most dramatic difference in scattering length is that 
between the nucleus of hydrogen, a proton, and the 
nucleus of deuterium, a deuteron. Their neutron scatter- 
ing lengths are -3.74 fm and 6.67 fm, respectively. The 
negative sign for that ofa proton indicates that it scatters 
neutrons 180° out of phase to those scattered by a 
deuteron, so that the contrast produced by interference 
between a neutron scattered from a proton and that scat- 
tered from a deuteron is dramatic. 

This large difference in scattering length between 
proton and deuteron has been used to map the distances 
between the proteins in the 30S ribosomal subunit from 
E.coli (Figure 11-5) by neutron scattering.”' The 21 
deuterated proteins found in a 30S subunit were pro- 
duced by growing bacteria on [’H]H;O. Pairs of deuter- 
ated proteins were reassembled into the same 
30S subunit with the RNA and the other proteins all 
undeuterated. The incremental scattering of neutrons 
due to just the interference between each pair of deuter- 
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ated proteins was measured by subtracting the scattering 
from a mixture of the two respective types of 
30S subunits that each contained only one of the two 
deuterated proteins.” The Fourier transform (Figure 
12-2C) of this incremental neutron scattering function 
provides the frequency with which vectors of length r 
connect a volume element in one deuterated protein 
with a volume element in the other. The maximum in 
such a curve is an estimate of the distance between the 
centers of mass of the two proteins in the 30S subunit. 
Enough of these distances (92 out of a possible 210) were 
measured to establish the relative positions of all 21 of 
the proteins in the 30S subunit.*! The majority of these 
relative positions agreed with their relative positions in 
the crystallographic molecular model of the 
30S subunit.” 

Because the molecules of protein in solution are 
randomly oriented, solution scattering curves for X-radi- 
ation or neutrons are rotationally averaged and by them- 
selves can provide only spherically symmetric 
structural information. Because molecules of protein 
are not spherically symmetric, some information about 
the structure of the protein must be available to con- 
strain the model. For example, scattering of X-radiation 
or neutrons can be used together with crystallographic 
information. Crystallographic molecular models of the 
domains of a protein may be available but not a crystal- 
lographic molecular model of the complete protein, and 
scattering curves can provide models for the dispositions 
of the domains in the intact protein.’ Intact 
immunoglobulin M is a pentameric ring of five subunits 
each composed of two Fab portions and one Fc portion 
formed from folded polypeptides related to those form- 
ing immunoglobulin G (Figure 11-1). Ten copies of a 
crystallographic molecular model of an Fab fragment of 
immunoglobulin G and five copies of a crystallographic 
molecular model of an Fc fragment of immuno- 
globulin G could be arranged to produce a structure with 
a calculated solution scattering curve in agreement with 
that observed for immunoglobulin M, and the appropri- 
ate combinations of Fab and Fc fragments could be 
arranged to produce structures with calculated solution 
scattering curves in agreement with those observed for 
the respective fragments.” 

Measurements of X-ray or neutron scattering from 
a solution of a protein for which a complete crystallo- 
graphic molecular model is available have also proven to 
be valuable complements to the crystallographic obser- 
vation. The usual result is that the theoretical solution 
scattering curve calculated from the crystallographic 
molecular model agrees closely with the one that is 
observed.” Such coincidences are further evidence 
that the crystallographic molecular model represents the 
structure of the protein when it is in solution. 

There are, however, a number of instances in which 
measurements of scattering have been used to adjust 
crystallographic molecular models. Two independently 


shifting domains in a crystallographic molecular model 
of a protein are often confined to a particular orientation 
either by the packing forces of the crystal or by the bind- 
ing of a ligand. A measurement of the radius of gyration 
from small-angle X-ray scattering of the protein in solu- 
tion unconfined by the packing forces or in the absence 
of that ligand can be matched with a value calculated 
from a molecular model of the protein in which the 
domains have a different orientation from those 
observed crystallographically.*° From a crystallographic 
molecular model of the protein” or a systematically 
altered conformation of that molecular model, the fre- 
quency with which vectors of length r actually do con- 
nect volume elements within the model can be 
calculated and compared with the observed distance dis- 
tribution function. In Figure 12-2C, the line through the 
points was calculated from an altered conformation of 
the crystallographic molecular model of cyclic AMP- 
dependent protein kinase. This altered conformation 
was proposed to represent the structure that the mole- 
cule assumes in solution in the absence of the peptide 
that was bound to the protein when it was crystallized. 

The disagreement between an observed solution 
scattering curve and that calculated from a crystallo- 
graphic molecular model is also an indication that the 
protein assumes a conformation different from that of 
the crystallographic molecular model when it is dis- 
solved in a solution of a particular composition,” for 
example, in which ligands for the protein may be dis- 
solved.” Theoretical solution scattering curves calcu- 
lated from likely alternative conformations often 
indicate how the structure of the protein has changed. In 
the case of aspartate carbamoyltransferase, it had been 
possible to crystallize the protein in the conformation it 
assumes when its regulatory ßsubunits bind MgATP. 
Even so, the crystallographic molecular model of this 
conformation“ had to be adjusted before it would pro- 
duce a theoretical X-ray solution scattering curve that 
agreed with the one observed for it in solution TH Only 
when the distance between the two catalytic a trimers in 
the crystallographic molecular model (Figure 9-37) was 
increased 0.3 nm by reasonable rotations of the regula- 
tory dimers did the theoretical curve agree with the 
observed curve. 

The examples of measurements of small-angle 
X-ray scattering for cyclic-AMP dependent protein 
kinase (Figure 12-2C) and of X-ray solution scattering for 
aspartate carbamoyltransferase illustrate the use of scat- 
tering in comparisons of the structure of a protein in 
solution to its structure in a crystallographic molecular 
model. In both of these instances, the measurements of 
scattering were used to adjust appropriately and realisti- 
cally the structure of the crystallographic molecular 
model, and there are other instances in which such 
adjustments have also been required.” The fact that 
reasonable adjustments in the orientations of domains 
or the disposition of subunits are all that is necessary to 


bring errant crystallographic molecular models into 
coincidence is further evidence that the crystallographic 
molecular model represents the structure of the protein. 
The solution scattering curves of helical polymeric pro- 
teins can also provide information about the parameters 
of the helix into which the monomers are assembled.*® 

The difficulty with measurements of hydrodynamic 
properties or of small-angle scattering for assessing the 
shape of a molecule of protein is that they often provide 
only one unambiguous numerical result, either a fric- 
tional coefficient, a Simha factor, or a radius of gyration. 
If the frictional ratio f/fo, is less than 1.15, the Simha 
factor vis less than 4.0, or the radius of gyration Rc is near 
a value of (%)” Ron it can be concluded that the protein is 
globular. If the value of one or more of these parameters 
for a given protein is significantly greater than the values 
expected for a sphere, it is usually necessary to conclude 
that the protein has a highly irregular surface, has an 
extended structure, has a high degree of hydration, or has 
some combination of these features. It is clear from the 
foregoing discussion of some of the results that larger 
values of these parameters are consistent with a large set 
of particular arrangements of the available mass and 
values for hydration. The only reason so much is heard 
about prolate and oblate ellipsoids of revolution is not 
that molecules of proteins are such geometric solids but 
that frictional coefficients and radii of gyration can be 
calculated explicitly for such solids. In using any of these 
measured parameters in an informative way, other 
details about the structure of the protein are essential. 

One way to observe the shape of a molecule of pro- 
tein directly is by electron microscopy. The three sym- 
metrically protruding regulatory subunits on aspartate 
carbamoyltransferase and the hollow, water-filled cavity 
between its two rotationally symmetric, trimeric œ sub- 
units (Figure 9-37), which together probably account for 
its abnormally large frictional coefficient, were first 
observed in electron micrographs of the protein (Figure 
12-4A)**°° before there was a crystallographic molecular 
model. Another protein with an abnormally large fric- 
tional coefficient is fibrinogen. Electron micrographs of 
fibrinogen (Figure 12-4B),' which turned out to be 
remarkably accurate representations of its structure, 
were published 20 years before a crystallographic molec- 
ular model of the protein (Figure 13-22) became avail- 
able.” 

To prepare it for direct observation in the electron 
microscope, a protein molecule can either be negatively 
stained by being embedded in a glass of the salt of a 
heavy metal ion such as uranyl cation or phospho- 
tungstate anion (Figure 12-4C, upper four images)” or 
be positively stained by being rotary shadowed” with a 
layer of platinum that produces a metallic replica of the 
molecule (Figure 12-4C, lower four images). In the 
former method, the molecule of protein, because it is less 
electron dense than the glass, appears as a light image 
against a dark background; in the latter method, the mol- 
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ecule of protein coated with the metal appears dark 
against a light background. Whenever results from such 
electron microscopic studies are presented, a represen- 
tative field of molecules (Figure 12-4D)” should be 
shown so that the reader can judge what fraction of the 
molecules of protein on the film of carbon or in the 
metallic replica give images that resemble the images 
chosen for a gallery of “representative” views (Figures 
12-4A-C). 

Collagen XII from Gallus gallus is a trimer of three 
polypeptides. All three polypeptides in the trimer are 
encoded by the same gene, but there are two forms of the 
polypeptide, one 3100 aa long and the other about 
2700 aa long, produced by translation of alternatively 
spliced versions of the same messenger RNA;” the 
shorter translation is missing the amino-terminal 400 aa 
of the sequence. The carboxy-terminal 380 aa of each 
polypeptide contains two segments (152 and 103 aa) of 
collagen repeat, and in this region the three polypeptides 
of the trimer should form an interrupted triple-helical 
rope of collagen (Figure 9-33) with two segments 45 and 
30 nm in length. This triple-helical rope is the structure 
holding the three polypeptides together in the oligomer. 
The amino-terminal 1870 aa of the short splice variant 
contains 10 fibronectin type III modular domains, strung 
together in a necklace that should be 32 nm in length,” 
and one amino-terminal von Willebrand factor type A 
modular domain, a globular structure about 4nm in 
diameter. The longer splice variant has an additional 
eight fibronectin type II] modular domains (26 nm) and 
two additional von Willebrand factor type A domains. 
The electron micrographs of collagen type XII display a 
structure that is the fulfillment of these expectations 
(Figure 12-4C).**" The single, thin collagen tail of 75 nm 
is kinked about 30 nm from its end.” There is a central 
globular region from which three significantly thicker 
arms extend, each either a short or a long variant; the 
short variant is about 40 nm long with a globular ball at 
its end,” and the long variant is about 90 nm long with a 
globular ball in its middle and two globular balls at its 
end.” The three polypeptides enter the center wrapped 
around each other in the collagen tail and leave the 
center separately as three fibronectin necklaces. 

Electron micrographs of activated bovine coagula- 
tion FactorVa were instrumental in explaining its 
unusual behavior upon sedimentation analysis. The pro- 
tein has a molar mass” of 170,000 g mol and a standard 
sedimentation coefficient,” Viw of 8.2 S, from which a 
frictional ratio f/f), of 1.6 (based on the assumption that 
64,0 = 0.3) can be calculated. When activated coagulation 
Factor Va was observed by electron microscopy, it was 
found to be two globular domains of protein, very simi- 
lar in diameter, attached together through a narrow 
neck.” This shape seen in the electron micrographs is 
responsible for the unusually large frictional ratio of this 
protein. Although an axial ratio for a prolate ellipsoid was 
calculated from the earlier results of the sedimentation 
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analysis,” it was moot when the electron micrographs 
became available.” 

Rotary shadowing is usually used for estimating the 
dimensions of extended molecules such as collagen XII 
(Figure 12-4C), nidogen,” inversion-specific glycopro- 
tein,” fibulin,” and myosin.” The molecule of protein is 
spread upon a flat sheet of mica before being coated with 
metal. Consequently, a long, thin, flexible molecule 
should lie almost flat upon the surface, and the contour 
length of its replica in the two-dimensional micrograph 
should be almost as long as its actual contour length in 
three dimensions. For this reason, rotary shadowing is 
thought to be the most reliable method to obtain esti- 
mates of the length of an extended molecule of protein. 
For example, the frictional ratio f/f), for a fragment of 
caldesmon (amino acids 166-450 from a polypeptide of 
756 aa)® is 2.2, consistent with a cylinder of the appro- 
priate volume that is 40 nm long. Electron micrographs 
of this fragment of caldesmon that had been rotary-shad- 
owed with platinum and tungsten displayed elongated 
molecules of uniform thickness with contour lengths 
that averaged 35 nm. In rotary-shadowed images, globu- 
lar domains such as the two heads of myosin or the three 
globular domains of nidogen appear as dark, unfeatured 
lumps. Although globular proteins constructed from 
clusters of globular domains or from globular subunits 
can appear as clusters of dark lumps upon rotary shad- 
owing,” they usually appear as single undifferentiated 
and structureless lumps of platinum. Negative staining 
(Figure 12-4A,D) or embedding in amorphous vitreous 
ice is required to obtain images displaying details of the 
structure of a globular protein. 

Phosphorylase kinase is a globular protein with a 
dramatically peculiar shape. Because of its unusual 
shape, a collection of digitized micrographic images of 
individual molecules (Figure 12-4D) could be super- 
posed by a computer and stacked one upon the other.°® 


The average of this stack of images could be calculated, 
and this average represents an enhanced image of the 
actual molecule (inset, Figure 12-4D). The chalice seen 
in the enhanced image can be imperfectly discerned in 
each of the individual selected images (Figure 12-4D), 
and this correspondence fulfills the usual requirement 
placed upon any reconstruction. A similar procedure has 
been applied to &-macroglobulin®® to obtain enhanced 
images. This protein has a shape almost as peculiar as 
that of phosphorylase kinase. 

It is also possible to obtain a three-dimensional 
reconstruction of the structure of a macromolecule 
observed in an electron micrograph. An electron micro- 
graph of a macromolecule is the two-dimensional pro- 
jection either of the structure of that macromolecule if it 
is embedded in amorphous vitreous ice or of the mold of 
that macromolecule if it is embedded in negative stain. It 
has already been noted that the two-dimensional Fourier 
transform of the projection of a three-dimensional object 
is a central section of the three-dimensional Fourier 
transform of the unprojected object (Equations 9-4 
through 9-6). From the complete three-dimensional 
Fourier transform of the object, the distribution of scat- 
tering density within the object, and hence details of its 
three-dimensional structure, can be calculated by 
Fourier transformation. To gather the complete three- 
dimensional Fourier transform of the object, Fourier 
transforms of projections of the object in a large number 
of different orientations must be assembled. This is 
accomplished in a helical polymeric protein by the fact 
that the helical array positions the monomer in specific, 
defined orientations, each of which provides a different 
projection. When molecules are not arranged in such an 
array but scattered upon the field, as are the molecules of 
phosphorylase kinase in Figure 12-4D, it is necessary to 
define the precise orientation of each of them relative to 
the plane of the micrograph before their individual 


Figure 12-4: Asymmetric molecules of protein viewed by electron microscopy. Solutions of the protein of interest (10 ug mL to 1.0 mg 
mL") were applied to electron microscopic grids coated with a thin film of carbon“? supported by a net of either collodion or formvar. The 
layer of carbon (~5 nm) was ionized so that it was hydrophilic enough to accept the aqueous solution. The molecules of protein were 
adsorbed to this surface and were then negatively stained with either 2% phosphotungstate (panel A) or 1-2% uranyl formate (panels B and 
D). The water evaporates to leave a glass of the heavy metal salt in which are embedded the molecules of protein. (A) Gallery of selected 
images” of aspartate carbamoyltransferase from E coli (Figure 9-37) viewed either along one of its 2-fold rotational axes of symmetry (left 
three images) or along its 3-fold rotational axis of symmetry (right three images) at 480000x. Reprinted with permission from ref 50. Copyright 
1972 American Chemical Society. (B) Gallery of selected images of bovine fibrinogen*! at 480000x. The elongated molecule has globular 
domains at each end. Reprinted with permission from ref 51. Copyright 1981 Academic Press. (C) Selected images of collagen type XII from 
Gallus gallus**™ at 240000x. The upper four images were negatively stained with uranyl formate. Reprinted with permission from ref 54. 
Copyright 1992 Blackwell Publishing. The lower four images are molecules of protein that were rotary shadowed.” A solution of the protein 
was sprayed into an aerosol mist and droplets of the mist were adsorbed to a sheet of mica. A beam of platinum atoms was directed at an 
angle of 6° onto the surface of the mica as it was rotated at 120 revolutions min™. The resulting thin film of platinum containing replicas of 
the molecules of protein was transferred to a copper grid. In the four rotary-shadowed images, selected representatives of a homotrimer of 
long splice variants (1;), of a homotrimer of short splice variants (s3), and of the two heterotrimers (},s, 1s,) are presented at 240000x. Reprinted 
with permission from ref 53. Copyright 1995 The Rockefeller University Press. (D) A field of negatively stained molecules of phosphorylase 
kinase” at 480000x. This is an accurate representation of the usual situation. Most of the molecules of protein negatively stained on the grid 
are featureless asymmetric structures. The minority that present a repeating, definable image (indicated by arrowheads) are selected by the 
microscopist as representative images of the protein and presented in galleries as in panels A and B. In this instance, the shape of the indi- 
vidual images of phosphorylase kinase was so peculiar that digitized optical densities of a large number of the selected images of individual 
molecules (62) could be sequentially superposed by a computer to obtain an enhanced image (inset). Reprinted with permission from ref 56. 
Copyright 1985 Academic Press. 
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Fourier transforms can be summed together to obtain 
the complete three-dimensional Fourier transform of the 
average molecule. 

When the objects scattered over the field of the 
electron micrograph are viruses, the icosahedral symme- 
try of each of the viral coat proteins permits the exact ori- 
entation of each individual virion to be estimated.°° ” 
The assignment of an orientation to each virion permits 
the Fourier transforms of their projections to be added 
together to obtain a complete three-dimensional Fourier 
transform of the average viral particle. In this way, a 
three-dimensional reconstruction of the structure of the 
viral coat and other appendages of the virus that are 
distributed with icosahedral symmetry” can be calcu- 
lated. For such reconstructions, the viral particles are 
usually suspended in a layer of amorphous vitreous ice. 

Ifit is possible to define somehow the orientation of 
each member of a population of asymmetric molecules 
spread upon a grid at random, the same type of summa- 
tion can be performed. If the molecule has a tendency to 
lie upon the carbon surface in a preferred orientation, for 
example, the molecules of phosphorylase kinase that 
settle on the grid to present a projection in the shape ofa 
chalice (Figure 12-4D), this tendency orients them in one 
dimension but fortunately only in one dimension. The 
direction in which each individual oriented molecule 
faces upon the surface is random, and the direction in 
which each faces can be defined by direct observation. 
When the grid is then tilted 50°, a large collection of mol- 
ecules, each in a completely different three-dimensional 
orientation but each in a known three-dimensional ori- 
entation relative to the others, is created.“ From a sum- 
mation of their individual Fourier transforms in the 
tilted image, a three-dimensional Fourier transform of 
the structure of the average molecule can be gathered. 
From this Fourier transform, a molecular model can be 
calculated. Such reconstructions have been performed 
for human o-macroglobulin” and the 50S subunit of the 
ribosome.’®” The resulting molecular model of the 
50S subunit of the ribosome, albeit at low resolution, 
resembled quite closely the crystallographic molecular 
model that became available 12 years later.” 
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Problem 12-1: Calculate the values of fea, bm, and fo,unh 
from T, One LE and Mprot for each protein in Table 
12-1. From tabulated values of fay/fo, determine a/b for 


each protein on the assumption that they are prolate 
ellipsoids of revolution. 


Problem 12-2: Human immunoglobulin G is a protein 
with a molar mass of 167,000 g mol’ and a partial spe- 
cific volume of 0.739 cm? g`. 

(A) Assume hydration to be 0.3 g of H,O (g of pro- 
tein)’ and calculate the minimum frictional coef- 
ficient, fon, for the hydrated hydrodynamic 
particle at 20 °C in water. 


(B) The standard sedimentation coefficient ee for 
immunoglobulin G is 7.0 x 10™ s, and the stan- 
dard diffusion coefficient Diss is 4.0 x 107 el, 
Calculate the frictional coefficient, first from the 
standard sedimentation coefficient and then from 
the standard diffusion coefficient. 


(C) From the average of these two estimates of the 
frictional coefficient and from the estimate of the 
minimum frictional coefficient of the hydrated 
hydrodynamic particle, estimate the axial ratio 
a/b upon the assumption that the molecule is a 
prolate ellipsoid of revolution. 


(D) The shape of an immunoglobulin G is displayed 
in Figure 7-13. How does this structure compare 
with your estimate of its shape? 


Problem 12-3: Thiosulfate sulfurtransferase is a mono- 
meric enzyme. The polypeptide from bovine liver is 296 
amino acids in length and has a molar mass of 33,160 g 
mol". In water at 20 °C the standard sedimentation coef- 
ficient of the native protein is 3.00 x 10™ s, and its stan- 
dard diffusion coefficient is 7.50 x 10” cm? sl, The 
partial specific volume of the protein is 0.742 cm’ g”. 


(A) Calculate the frictional coefficient of the protein. 


(B) Calculate the frictional ratio for the hydrody- 
namic particle f/ fon, with the assumption that the 
hydration of the protein is 0.3 g of H,O (g of pro- 
tein)". 


(C) What would be the axial ratio of an ellipsoid of 
revolution with this frictional ratio? 


(D) How does this estimation compare to the crystal- 
lographic molecular model of the protein (Figure 
9-18)? 


Problem 12-4: The standard sedimentation coefficient 
of human fibrinogen is 7.63 x 10°" s, its molar mass is 
344,000 g mol’, and its partial specific volume is 
0.725 cm’ g”. 


(A) Assume Au = 0.3 and determine the volume of 
the hydrodynamic particle, the frictional coeffi- 
cient of fibrinogen, its frictional ratio, and its axial 
ratio and dimensions on the basis of the assump- 


tion that it is a prolate ellipsoid of revolution or a 
cylindrical rod. 


(B) The length of the fibrinogen molecule has been 
determined to be 45 nm by electron microscopy 
(Figure 12-4B) and 45 nm by direct measurement 
of its crystallographic molecular model (Figure 
13-22A). Calculate the dimensions of a prolate 
ellipsoid or a cylindrical rod with the same 
volume as the hydrodynamic particle and a major 
axis of length 22.4 nm. 


(C) The intrinsic viscosity of fibrinogen” is 27 cm? g”. 
Calculate its Simha factor v and estimate its axial 
ratio and molecular dimensions on the basis of 
the assumption that it is a prolate ellipsoid of rev- 
olution or a cylindrical rod. 


Problem 12-5: The heads of myosin (Figure 13-30A) can 
be detached from the intact protein by mild treatment 
with the endopeptidase papain. The detachable domain 
that was one of the heads and that is produced by the 
digestion with papain is referred to as the S1 fragment. It 
is a complex of three folded polypeptides. The S1 frag- 
ment from chicken muscle contains the amino-terminal 
845 aa of the o polypeptide, or heavy chain, of myosin 
and two shorter polypeptides, or light chains, of lengths 
149 and 163 aa. 


(A) Estimate the molar mass of the S1 fragment 
from the lengths of its three constituent polypep- 
tides. 


The following table lists physical properties of the 
S1 fragment.” 


property value 
w 5.8 x 10 s 
Diw 4.6 x 10” cm? s! 
D 0.73 cm? e" 
[n] 6.4 cm? g! 


(B) Calculate the frictional coefficient of the S1 frag- 
ment. 


(C) Assume a hydration of 0.3 g g", and calculate the 
frictional ratio for the S1 fragment. 


(D) Assume a hydration of 0.3 gg", and calculate the 
Simha factor for the S1 fragment. 


(Œ) Estimate the axial ratio that the S1 fragment 
would have if it were a prolate ellipsoid of revolu- 
tion. 


(F) The following are two orthogonal views of a 
space-filling representation of the crystallo- 
graphic molecular model of the S1 fragment of 
myosin.°' Estimate its axial ratio from the 
figures. 


(G) What value would äu o have to have for both the 
intrinsic viscosity and the frictional ratio to give 
the axial ratio you measured from the crystallo- 
graphic molecular model? 


Problem 12-6: Tropomyosin is a protein formed from 
identical folded polypeptides each 284 amino acids in 
length (Mprot = 32,680 g mol). It has a partial specific 
volume of 0.72 cm? g'. The molar mass of tropomyosin 
was determined by osmotic pressure. Each measure- 
ment was extrapolated to %rot = 0. The osmotic pressure 
at 0 °C as a function of ionic strength is presented in the 
following table.” 


II 
ionic strength Too? Y rot 
(M) (mmHg cm? g’) 
0.10 126 
0.20 154 
0.27 193 
0.30 236 
0.60 254 
1.10 264 


(A) What are the values for the apparent molar 
masses of tropomyosin at the several ionic 
strengths? Show in detail the calculation of molar 
mass at ionic strength of 0.10 M. 


(B) How many polypeptides compose the major 
form of tropomyosin present in solution at high 
ionic strength? What is the exact molar mass of 
just this major form? Why does the number- 
average molar mass increase as the ionic strength 
is lowered? 
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(C) The following data?’ were gathered from solutions 
of tropomyosin at an ionic strength of 1.1 M: 


Yprot nm 
[g (100 mL)" ] 

1.210 0.33 
1.299 0.44 
1.466 0.64 
1.588 0.76 
1.793 0.96 
2.223 1.29 
2.972 1.72 
4.603 2.14 


where 17 is the viscosity of the solution of protein, 
n is the viscosity of the solvent, and Wrot is the con- 
centration of protein. Determine the intrinsic vis- 
cosity [7] at this ionic strength. 


(D) Assume that 6,0 = 0.3 g of H,O (g of protein)” 
and calculate the Simha factor v. From v deter- 
mine the axial ratio for tropomyosin if it were a 
prolate ellipsoid of revolution by using Figure 
12-1B,D. 


(E) From spectroscopic measurements it is known 
that, at all ionic strengths, tropomyosin is >90% 
a-helical. It is a coiled coil in which the two 
æ helices wrap around each other as the strands in 
a two-stranded rope. The length of an o helix for 
each of its amino acids is 0.15 nm. Calculate the 
length of a molecule of tropomyosin at high ionic 
strength and, assuming it to be a cylindrical rod, 
calculate its diameter from its hydrated molecular 
volume. What is its actual axial ratio? Compare 
this to the axial ratio obtained from v. 


(F) The intrinsic viscosities of solutions of 
tropomyosin also vary with ionic strength: 


[n] ionic strength 
(M) 
1.00 1.1 
1.03 0.6 
1.23 0.3 
1.75 0.2 
2.45 0.1 


Plot molar mass against specific viscosity and explain the 
correlation in terms of structures that could form as the 
ionic strength is lowered. 


Problem 12-7: The following are a set of data for the 
light scattering of a series of solutions (0.2-0.8 g L”) of 
myosin.® The wavelength of light used for the observa- 
tions was 436 nm (in a vacuum). The solutions were 
examined at a temperature of 20 °C, and the refractive 


index of the solvent in which the myosin was dissolved 
was 1.34. The refractive index increment (0/1/ OYprot) p,, for 
myosin is 0.208 cm? g"', and the molar mass of myosin is 


527,000 g mol". 


tins 
8 To? Ry 
(deg) (g cm”) 
30 0.74 
35 0.74 
40 0.78 
45 0.80 
50 0.83 
55 0.86 
60 0.89 
70 0.95 
90 1.07 
110 1.17 
140 1.24 


(A) What was the wavelength of the light in the solu- 
tions? 


(B) What is the radius of gyration of the myosin under 
these circumstances? 


(C) What would be the length of a rod with this radius 
of gyration? 


(D) What is the length of the molecules of myosin in 
Figure 13-30A? The magnification stated in the 
legend for Figure 13-30A is for the figure in the 
text. 


Problem 12-8: Triskelion is a protein that assembles 
into spherical structures known as clathrin coats. These 
clathrin coats are the structures that surround the 
coated vesicles formed from the invagination of the 
plasma membrane of an animal cell at sites known as 
coated pits. Bovine triskelion is formed from three iden- 
tical heavy polypeptides (na = 1675 polypeptide", 
Mprot = 191,590 g mol’) and three identical light 
polypeptides (na = 228 polypeptide’, Mpror = 25,080 
g mol”). Its partial specific volume, calculated from its 
amino acid composition, is 0.744 cm’ g`. 


(A) Calculate the unhydrated volume of triskelion. 


(B) What would be the unhydrated radius (Ro unn) and 
the unhydrated radius of gyration (Rg spn) of 
triskelion if it were a sphere? 


The light scattering of triskelion dissolved in a 
buffered solution was monitored either as a function of 
the concentration of protein (Wrot) or as a function of 
the scattering angle OI! Reprinted with permission 
from ref 84. Copyright 1991 American Chemical 
Society. 
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Open and closed circles show the plots of [KYror/ Rel eo 
against the concentration of clathrin and [KYpro¢/ Rol ya 0 
against sin’ (6/2), respectively. The units on the vertical 
axis are moles gram™. The authors of this study have 
used an optical constant K that incorporates the incre- 
ment of the refractive index so that its value is 
27 fy? (on IY pro) Nag’ The refractive indices (ñ) of the 
two solutions were both 1.34. The laser used in the exper- 
iment emitted polarized light of wavelength 632.8 nm in 
the vacuum. 


(C) How well does the molar mass of the protein 
observed in the light scattering experiment agree 
with that calculated from the sequences of the 
constituent polypeptides of triskelion? 


(D) From the slope of the appropriate line in the figure, 
calculate the radius of gyration for triskelion. 

(One way to approach this problem is to multiply both 
sides of Equation 12-23 by the optical constant K.) 


(Œ) Is triskelion a globular protein? 


Problem 12-9: The definition of Rayleigh’s ratio for the 
scattering of unpolarized light is 


ER 
r lo 


I,(1 + cos?0) 


Rọ = 


where i, is the intensity of the scattered light due only to 
the protein at an angle @to the incident beam. At very low 
angles (0< 5x 10” rad), cos? 9= 1.00. In this situation, by 
Equation 12-19 


l r? on \~ 
P(@)= lim — 
Yprot— 0 Yprot 2 Ip KM prot d Y prot TPu 


Shape 591 


and 


In P(@) = lim 


Yprot— 9 


i 
[im eh 
Yprot 


where A is a constant determined by the values of all of 
the fixed parameters in the brackets. At very low angles, 
the approximation of Equation 12-29 is also valid, and it 
follows that 


i 167° RÈ 
lim In ge A S G sin? 2 
Yprot > 0 Yprot 31 2 


If wis expressed in radians 


. o? o’ 
sino = ®- + — 
3! 5! 
(A) Show that 
l d 16r RÈ gau 
lim In =A- „2 (2) 
0—0 Y, 3 
Yprot — 9 prot 


Itis more convenient to take a series of measurements at 
varying scattering angles, 0 (in radians), of a single solu- 
tion of protein than to measure the scattering at a fixed 
angle for several solutions of protein. Therefore, what is 
usually done is to determine the slope of In (el ee) as a 
function of 9° at various fixed values of Hrot and then 
extrapolate to %rot = 0. If, however, %rot is held constant, 
as is done in such an experiment, then 


i 


lim In = lim Ini, + Iny 
0—0 prot 0—0 9 prot 


and at constant Yrot 


167? R 2 
lim Ini E 
gn 9 32 


where A’ is a constant equal to A — In Wrot- 


(B) What should be the slope of the line for a plot of 
In ig against 0° at low concentrations of protein 
and low scattering angle? 


Immunity protein is a protein produced by strains of the 
bacterium E coli producing colicin E}. The intensities of 
the scattered X-rays (4 = 0.154nm; A = 1.33; A = 
0.116 nm) as a function of the square of the scattering 
angle were measured for a series of solutions of immu- 
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nity protein.” Reprinted with permission from ref 23. 
Copyright 1983 Journal of Biological Chemistry. 


Protein 
(mg mL") 
14.6 


0 3 6 9 12 
(Scattering angle)? (radians? x 104) 


The natural logarithm of the intensity of the scattered 
X-rays, In ig, is presented as a function of the square of 
the scattering angle @ in radians’ x 10%. (A value of 6.0 on 
the abscissa is equal to 6.0 x 10“ rad?.) The values for the 
concentration of protein (Wrot in milligrams milliliter”) 
are indicated to the right. The slopes of the lines in the 
plot are 


Yprot slope 
(mg mL") (rad) 
2.9 1130 
5.8 420 
8.8 540 
11.7 550 
14.6 630 


(C) Calculate the apparent radius of gyration, Rg app, 
for each concentration of protein. 


(D) Extrapolate to ¥,.;=0 and obtain the actual radius 
of gyration, Rg. 


(E) The molar mass of immunity protein is 9770 
g mol”, and its partial specific volume, Timm, is 
0.73 cm? g". Calculate its unhydrated volume and 


the radius a of a sphere with that volume. 


(F) What would be the radius of gyration of immunity 
protein if it were this sphere? 


Absorption and Emission of Light 


A valence electron in any molecule occupies an atomic 
orbital or a molecular orbital that has energy levels asso- 
ciated with it (Figure 12-5).® These energy levels have 
discrete magnitudes because of the quantum theory, and 
the steps between any two energy levels are also of dis- 
crete magnitude or quantized. The energy levels that 
have the smallest steps between them are the rotational 
energy levels. These energy levels correspond to the 
quantized kinetic and potential energy associated with 
the rotations around the bond in which a particular elec- 
tron resides and with the bonds that are coupled to it. 
The steps between successive rotational energy levels are 
normally 0.5-50 J mol" in magnitude, corresponding to 
the energy in a photon of wavelength 200 to 2 mm. The 
energy levels that have the next larger steps between suc- 
cessive stages are the vibrational energy levels. These 
energy levels reflect the quantization of the energy of the 
vibrations of the bond in which a particular electron 
resides and of neighboring bonds that are coupled to it. 
The steps between vibrational energy levels are normally 
5-50 kJ mol in magnitude, corresponding to the energy 
in a photon of wavelength 20 to 2 um. The energy levels 
that have the next larger steps among them are the elec- 
tronic energy levels of the molecule. The electronic 
energy levels relevant to these transitions are those of the 
atomic orbital or molecular orbital in which a particular 
electron resides and of the vacant atomic orbitals or 
molecular orbitals that are accessible to it. Steps between 
two of these electronic energy levels are normally 
50-500 kJ mol! in magnitude corresponding to the 
energy in a photon of wavelength 2000 to 200 nm. These 
three types of energy levels form nested sets (Figure 
12-5). Within a given electronic energy level there are a 
series of associated vibrational energy levels, and within 
a given vibrational energy level there are a series of asso- 
ciated rotational energy levels. 

In a particular molecule, at a given instant, a dis- 
crete set of atomic and molecular orbitals will be occu- 
pied by the valence electrons to produce the o bonds, the 
mz molecular orbital systems, and the lone pairs. For each 
atomic or molecular orbital occupied by an electron, a 
particular vibrational energy level will also be occupied, 
and the rotational motions within the molecule will 
determine which particular rotational energy levels are 
also occupied. Because the differences in energy 
between electronic energy levels are so large, the equi- 
librium constants governing the occupation by elec- 
trons of the successive electronic energy levels at 
temperatures experienced by living organisms are also 
large, and only the lowest electronic energy levels are sig- 
nificantly occupied. The electrons in a molecule in the 
ground state are always (>99.99999%) distributed so as to 
fill in succession the levels of lowest electronic energy. As 
the differences in energy between vibrational energy 
levels are significant, the equilibrium constants govern- 
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Figure 12-5: Photophysical processes experienced by an electron in a covalent bond between two atoms.” The smooth curve Sy is the poten- 
tial energy of the molecular orbital of the covalent bond in which the electron resides as a function of the distance r between the two nuclei. 
The smooth curve S; is the potential energy that would be experienced if the electron were transferred to a particular unoccupied antibond- 
ing molecular orbital between the two atoms as a function of the distance between the two nuclei. Each well of potential energy has levels of 
vibrational energy (the parallel lines within each well) and levels of rotational energy (see expanded scale of the potential energy of the ground 
state to the left) associated with it. Absorptions by electrons of photons of energy equivalent to the differences in energy between vibrational 
energy levels (process IR) produces an infrared spectrum. When an electron in its occupied molecular orbital, the ground state, absorbs a 
quantum of electromagnetic energy sufficient to boost its energy high enough to enter the unoccupied molecular orbital, the excited state is 
created (process A). As the excited state relaxes, some of the absorbed energy is lost as heat. When the relaxed excited state emits light as the 
electron returns to the ground state (process F), the quantum of emitted fluorescent light has less energy (longer wavelength) than that of 
the quantum of light originally absorbed. If the spin of the electron inverts while it is in the excited state (process Sı > T}), the electron enters 
a triplet excited state. The smooth curve T; is the potential energy of the electron in the bond in the triplet state as a function of bond length. 
The triplet excited state also relaxes by giving off heat. The phosphorescent light emitted from the relaxed triplet state (process P) has even 
less energy (even longer wavelength) than the fluorescence from the initial excited state, and the triplet excited state has an even longer life- 
time. The bond lengths of the ground state, the excited singlet state, and the excited triplet state are indicated as Kä Ipa and Sch Adapted 
with permission from ref 85. Copyright 1977 W.A. Benjamin. 


ing the occupation of the successive levels at tempera- longer than that of the incident photon by a difference 
tures experienced by living organisms are also signifi- equivalent to the energy of the transition between the 
cant. Consequently, in the ground state, the vibrational two vibrational energy levels (expanded scale in Figure 
energy level that is occupied is usually (>90%) the lowest 12-5). If the photon is absorbed by an electron in the 
for each particular vibration. Because, however, their dif- excited state of a vibrational mode, it can carry away the 
ferences in energy are so small, rotational energy levels energy of the transition to the ground state and be scat- 
are widely occupied by the bonds in the different mole- tered with higher energy. If it is scattered by an electron 
cules in the solution. in the ground state that enters an excited state during its 

A photon can encounter an electron in a molecule residence, the photon will provide the energy for this 
in such a way that its energy is absorbed. If the electron transition and be scattered with lower energy. As a result, 
absorbs the energy of the oscillating electric field and the energies of the scattered light vary symmetrically 
then immediately emits the same photon back again about the energy of the incident light. The intensities of 
without retaining any of its energy, the direction of the the bands of scattered light with the longer wavelengths 
electromagnetic wave is altered so that its new direction are greater than those of shorter wavelength because 
of propagation bears no relationship to its incident direc- most vibrations are in the ground state before a photon 
tion while its wavelength remains the same. This is elas- is absorbed. The Raman effect that results is a set of 
tic scattering, and it is the phenomenon mainly small changes in wavelength that are experienced by a 
responsible for X-ray diffraction, low-angle X-ray scatter- small percentage of the photons that are scattered by the 
ing, and light scattering. electrons in the sample. As with light scattering itself, the 

If the electron that has absorbed the photon is in a Raman effect on scattered light is usually measured by 
bond that happens to change its vibrational energy level sampling the light emitted by a sample perpendicular to 
during the instant that it is excited, the subsequently the incident light. The incident light is from a laser, and 


scattered photon will have a wavelength that is shorter or it is intense and monochromatic. It is the spectrum of the 
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wavelengths of the scattered light that is determined. 
Although almost all of the scattered light is the same 
wavelength as the incident light, scattered light of other 
specific, sharply defined wavelengths is also present, and 
a Raman spectrum of this scattered light provides a cat- 
alogue of many of the transitions among the vibrational 
energy levels of the molecule. 

The decision as to whether the photon is immedi- 
ately scattered from the electron, in an event that is 
essentially instantaneous, or is absorbed by the electron 
for a longer period of time depends on how closely the 
energy of the photon matches one of the differences 
between the quantized energy levels available to the 
electron. If the energy of the photon that has just been 
absorbed by the electron is equal to the difference in 
energy between the vibrational energy level its bond 
occupied at the instant the photon was absorbed and a 
higher vibrational energy level accessible to it, the 
photon can be absorbed completely, and the vibrational 
energy of the bond occupied by the electron or that of a 
bond vibrationally coupled to it will increase by that step 
in energy (process IR in Figure 12-5). Most of the energy 
absorbed will not be emitted back radiatively as light but 
will be dissipated nonradiatively by intermolecular colli- 
sion or by exciting coupled rotational motions the energy 
levels of which bridge the gap between the vibrational 
energy levels of the ground state and the excited state 
(Figure 12-5). Any light emitted radiatively due to a direct 
transition from the excited state back to the ground state 
has the same wavelength as the absorbed light but an 
altered direction and becomes distinguishable from elas- 
tically scattered light only by the delay in its reemission. 
The absorption of infrared light (in the range from 
20,000 to 2000 nm, or 500 to 5000 cm™')* produces tran- 
sitions among vibrational energy levels, and a spectrum 
of infrared absorption has discrete maxima the energies 
of which correspond to transitions between pairs of 
vibrational energy levels. 

If the energy of the photon absorbed by the electron 
is equal to the difference in energy between the molecu- 
lar orbital or atomic orbital the electron occupies in the 
ground state and an unoccupied molecular or atomic 
orbital of higher energy, the photon can be absorbed 
(process A in Figure 12-5). The electron enters the unoc- 
cupied orbital, and an electronically excited state of the 
molecule is created. Because the excited state differs 
from the ground state in the distribution of its electrons 
among molecular and atomic orbitals, it should be 
thought of as a distinct, albeit similar, molecule. The 


* It is customary for investigators using Raman spectroscopy or 
infrared spectroscopy to present absorption as a function of the 
inverse of the wavelength, referred to as the wavenumber (in cen- 
timeters'), which is directly proportional to the energy of the 
absorption. One advantage of this convention is that the two sym- 
metrical displacements of the Raman effect for the same vibra- 
tional mode have the same numerical values when expressed in 
terms of wavenumber. 


formation of the excited state can be followed by moni- 
toring the disappearance of the absorption of light by the 
ground state® because the excited state, being a new 
molecule, has a different absorption spectrum. At the 
very least, in its most stable structure, this new molecule, 
the excited state, will have some bond lengths, bond 
angles, and bond energies that are different from those of 
the ground state because its bonding differs from that of 
the ground state. The instant the electron enters the new 
orbital, however, the molecule has the structure of the 
ground state. As the excited state relaxes in energy to the 
most stable structure available to it, the distance in 
energy between excited state and ground state shortens, 
and the excited electron loses energy. Because of the 
overlap required for excitation, the excited electron usu- 
ally enters the excited state through one of its higher 
vibrational energy levels, and it simultaneously loses 
energy by a nonradiative passage to the lowest vibra- 
tional energy level. The net result of these relaxations is 
that the excited electron very rapidly (10° to 10™ s) 
finds itself at an energy considerably below the energy it 
had achieved immediately after the light was absorbed. 

Because the rotational and vibrational energy levels 
of the electronic excited state usually overlap rotational 
and vibrational energy levels of the ground state, the 
electron usually reenters the molecular or atomic orbital 
of the ground state by pursuing a path among the rota- 
tional and vibrational energy levels of excited state and 
ground state. In this case, the energy originally absorbed 
is dissipated nonradiatively as heat, and only the absorp- 
tion of the light is detected. The absorption of ultraviolet 
and visible light (in the range between 200 and 2000 nm) 
produces transitions among electronic energy levels, and 
the result is a spectrum of ultraviolet or visible absorp- 
tion that has maxima the energies of which correspond 
to the energies of electronic transitions in the molecule. 

If, however, the energy levels of the ground state 
and the excited state overlap weakly, the excited electron 
can become trapped in the lowest vibrational energy 
level of the excited state long enough (>10° s) to reenter 
the ground state with a bang rather than a whimper. The 
reentry of the excited electron into the ground state in 
such a single step requires that the energy it loses be 
emitted as a photon. This emission is either fluorescence 
or phosphorescence. 

If it came from a covalent bond or a lone pair of 
electrons, then at the instant of excitation, the excited 
electron entering the new orbital has a spin opposite to 
the spin of the partner it left behind in its previous 
orbital. As it relaxes into the lowest vibrational energy of 
the excited state and as the excited state rearranges, the 
excited electron remains coupled to its old partner, and 
the excited state remains a singlet excited state. From a 
singlet excited state, the electron can rapidly return to 
the ground state because the excited electron can readily 
reenter its old orbital with a spin compatible with the 
single electron still there (process F in Figure 12-5) The 


reentry is spin-allowed and rapid (<10” s), and the emit- 
ted photon is fluorescence. 

The energy of a photon of fluorescent light is nec- 
essarily less than the energy of the photon absorbed by 
the electron during excitation because of the nonradia- 
tive relaxations of the excited state that have occurred. 
Fluorescent light is light of a longer wavelength (usually 
visible light) emitted shortly after (within 10° s) a mole- 
cule has absorbed light of a shorter wavelength (usually 
ultraviolet light). The spectrum of the light absorbed is 
the absorption spectrum; the spectrum of the light emit- 
ted is the emission spectrum. Fluorescent light, as with 
scattered light and for the same reasons, is emitted in all 
directions relative to the incident light unless intramole- 
cular interference occurs. It is usually measured perpen- 
dicular to the direction of the incident light. It can be 
measured under continuous excitation, or the excitation 
can be a flash (<10~ s in length), and the rate of decay of 
the fluorescence following the flash, indicative of its life- 
time, can be measured. 

If the electronically excited state is structured in 
such a way that the excited electron can become 
unpaired with the electron it left behind, it can enter a 
triplet excited state by intersystem crossing. The ground 
state of the triplet state is usually of lower energy than 
that of the singlet state. Once the triplet excited state has 
been occupied, the electron can return to the ground 
state only through a spin-disallowed process that is very 
slow (on the order of microseconds to seconds). The 
emitted light, or phosphorescence (process P in Figure 
12-5), emerges from the solution over a relatively long 
period of time and has an even longer wavelength than 
the rapidly emitted fluorescence. Fluorescence is light 
emitted from singlet excited states; phosphorescence, 
from triplet excited states. 

Both the direct and the Raman infrared spectra of 
proteins display absorptions of energy resulting from 
transitions between the quantized energy levels of 
molecular vibrations. The intensities of these absorp- 
tions are determined by selection rules that govern 
which transitions in vibrational energy are permitted and 
to what extent these transitions are able to absorb 
infrared light. Because the selection rules for direct 
absorption are different from the selection rules govern- 
ing the Raman effect, direct infrared absorption spectra 
provide data that are complementary to Raman infrared 
spectra. 

The most obvious and securely assigned absorp- 
tions in the infrared or Raman spectrum ofa protein arise 
from excitations of the vibrations of the amides in the 
polypeptide backbone. A secondary amide such as 
N-methylacetamide, a simple model for the peptide 
bond, absorbs infrared energy of wavelength around 
6000 nm (1650 cm”) into its C=O stretching vibration 
and of wavelengths around 6500 nm (1540 cm”) and 
around 8000 nm (1250 cm”) into a coupled C-N stretch- 
ing and N-H bending vibration.’ These three peaks of 
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absorbance are referred to as the amide I band, the 
amide II band, and the amide III band, respectively. The 
amide I band of absorbance is the strongest of the three 
and is the only one that is located in a region of the spec- 
trum that does not contain significant absorptions from 
other groups in a protein (Figure 12-6). 

Direct infrared spectroscopy of proteins in aqueous 
solution is severely compromised by the strong absorp- 
tion of infrared light by water and other solutes. An 
infrared spectrum registered by the Raman effect, how- 
ever, avoids these drawbacks. A Raman infrared spec- 
trum is monitored as small differences in wavelength 
relative to the wavelength of the incident light. 
Consequently, the actual light registering each of the 
bands in the Raman spectrum is within the visible range, 
not the infrared range, and the problems of the absorp- 
tion of infrared light by water and other solutes and by 
the container are avoided. Although the water in the sol- 
vent also produces Raman bands in the regions of its 
absorptions, they are much weaker than their direct 
absorptions of infrared light,” and the subtraction of 
background from the spectrum of the protein is much 
less drastic. 

Raman infrared spectroscopy, however, has its own 
disadvantages. Two of those disadvantages are that the 
signals registering the Raman infrared spectrum are 
weak and that these small signals can be swamped by flu- 
orescence. In addition, the presence of large particles 
such as fragments of membrane, by increasing the scat- 
tering from the solution, makes Raman spectroscopy 
even more difficult. Direct infrared absorption is unaf- 
fected by this latter problem, and infrared spectra of sus- 
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Figure 12-6: Direct infrared spectra in the amide I region of pro- 
teins with different mixtures of secondary structure.” (A) Human 
hemoglobin with its hemes in complex with CO at 130 mg mL” in 
10 mM sodium phosphate, pH 7.4. (B) Bovine ribonuclease A at 
50 mg mL” in 1% NaCl, pH 6.5. (C) Bovine immunoglobulin G 
at 50 mg mL” in 1% NaCl, pH 6.5. In each solution, the protein 
itself served to buffer the pH. The spectra were measured in a 
Fourier transform infrared spectrophotometer. The spectrum of 
10 mM sodium phosphate or 1% NaCl, respectively, was subtracted 
from each infrared spectrum to obtain the spectrum of the protein 
alone. Even at these high concentrations of protein, the absorption 
of the water in the sample accounted for 96% of the total absorp- 
tion at the wavelength where each of the proteins absorbed most 
strongly. The vertical dashed line at 1650 cm™ is to aid in compar- 
ing the curves. Reprinted with permission from ref 88. Copyright 
1990 American Chemical Society. 
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pensions of membranes can be readily measured.” 


Measurements of the direct infrared spectrum of solid 
dehydrated protein or even a crystal of protein can also 
be made.” 

When a solution of protein is excited with a He-Ne 
laser, the Raman infrared spectrum of the scattered light 
(Figure 12-7)” displays a maximum arising from the 
amide I band of the folded polypeptide with a wavenum- 
ber of around 1650 cm” less than the wavenumber of the 
majority of the scattered light, which has the same 
wavenumber as the incident light (15,802 cm”, 
632.8 nm). The amide III maximum at a wavenumber 
1250 cm™ less than that of the elastically scattered light 
and other maxima that can be assigned to vibrational 
transitions in some of the amino acids, such as phenyl- 
alanine, tyrosine, and methionine, are also observed.” 
By using difference spectra between a selectively deuter- 
ated protein and the same protein undeuterated, absorp- 
tion bands in the Raman infrared spectrum from other 
amino acids, such as leucine, isoleucine, valine, alanine, 
glutamate, and aspartate, can be dissected out of the 
complete spectrum.” Difference Raman infrared spectra 
between proteins selectively labeled with "oxygen and 
their unlabeled counterpart have also been reported,’ 
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Figure 12-7: Raman spectra of the intensity of the light scattered 
from (A) a solution of ribonuclease at 200 mg mL"! and (B) a solu- 
tion of amino acids at the same ratio that they are present in 
ribonuclease.” The samples were excited with a He-Ne laser 
(A = 632.8 nm), and the intensity of the light scattered was meas- 
ured as a function of wavenumber (centimeter) in the neighbor- 
hood of the wavenumber of the elastically scattered light, which 
had a wavenumber identical to that of the incident light 
(15,802 cm. The intensity of the scattered light is presented as a 
function of the difference between the wavenumber of the meas- 
ured light and the wavenumber of the incident and elastically scat- 
tered light. As in the direct infrared spectrum (Figure 12-6), the 
amide I absorption is the most obvious, but other absorptions that 
can be assigned to various vibrational modes of the side chains of 
the amino acids, as well as the partially obscured amide III band, 
are clearly seen in the spectrum. Reprinted with permission from 
ref 95. Copyright 1970 Academic Press. 


which can permit the observation of the vibrational tran- 
sitions for a single bond among the thousands within a 
particular protein. 

If the protein contains a functional group that 
absorbs the exciting visible light in an electronic transi- 
tion, as does, for example, the heme in hemoglobin,” 
bands in the Raman infrared spectrum resulting from the 
absorption of energy by vibrations of the atoms within or 
adjacent to that functional group will be enhanced, and 
this enhanced spectrum is referred to as a resonance 
Raman infrared spectrum.’ The maxima in a reso- 
nance Raman infrared spectrum can often be assigned to 
vibrations of particular bonds, such as an iron-dioxygen 
stretching vibration in oxygenated hemoglobin,” the 
oxygen-oxygen stretching vibration of the peroxy form of 
hemocyanin,” the iron-oxygen stretching vibration of 
the ferryl intermediate of cytochrome d ubiquinol oxi- 
dase,” or the copper-sulfur stretching vibrations in 
halocyanin.!® When light of wavelength 200 or 206.5 nm, 
which is in the range where peptide bonds absorb 
strongly, is used to produce a resonance Raman infrared 
spectrum, the amide I, amide II, and amide III bands are 
selectively enhanced.!*!® 

The amide I band in the direct infrared spectrum of 
a solution of a particular protein registers its secondary 
structure DOIT The amideI band in the direct 
infrared spectra of a protein containing mainly o helix, 
for example, hemoglobin (87% a-helical), has a maxi- 
mum at around 1655 cm’; that of a protein rich in 
Bsheet, for example, immunoglobulin G (67% ßstruc- 
ture), has a maximum around 1635 cm’; and that of a 
protein with a mixture of these two secondary structures, 
for example, ribonuclease A (23% ahelix and 46% 
p structure), has a spectrum that seems to register this 
mixture (Figure 12-6).°® Various algorithms have been 
derived for estimating the percentages of o helix, ß struc- 
ture, and f turn in a protein from the shape of the amide I 
band in its direct infrared spectrum.” The amide I 
band in the direct infrared spectrum of a coiled coil of 
ahelices also has a characteristic shape, diagnostic of 
this structure.” 

Unfortunately, as mentioned above, the amide I 
band falls in a region of the direct infrared spectrum 
where water absorbs strongly, and this strong absorp- 
tion by the solvent and its vapor must be subtracted to 
obtain only the amide I band of the protein.’ Deuterium 
oxide does not absorb strongly between 1700 and 
1600 cm”, and direct infrared spectra of proteins in deu- 
terium oxide rather than water display a readily meas- 
ured amide I band TI Unfortunately, it is difficult to 
exchange the protons with deuteriums on the amido 
nitrogens of the peptide bonds through the entire pro- 
tein,” because those in the interior are inaccessible to 
solvent. As a result of this incomplete exchange and the 
fact that the amide I bands of deuterated peptide bonds 
are shifted by at least 5cm™ relative to those of 
undeuterated peptide bonds,’ the resulting spectrum 


of the mixture of exchanged and unexchanged peptide 
bonds is difficult to dissect accurately into the compo- 
nents arising from the different secondary structures. 

An infrared spectrum registered by the Raman effect 
avoids these drawbacks, and the amide I band in such a 
spectrum also registers secondary structure "The wave- 
lengths of the absorptions, however, differ from those in 
a direct infrared spectrum. a-Helical proteins have 
amide I bands with maxima around 1645 cm” while pro- 
teins composed entirely of D structure have amide I bands 
with maxima around 1670 cm™.® In resonance Raman 
infrared spectra produced by excitation at 205.6 nm, the 
amide III band is prominent and also registers secondary 
structure (Amax for a helix at 1299 cm™ and Apax for B sheet 
at 1235 cm’!).'° Because of the various drawbacks of 
infrared spectroscopy with aqueous solutions of proteins, 
however, circular dichroism is more widely used to esti- 
mate percentages of secondary structure. 

Circular dichroism is a consequence of the absorp- 
tion of visible or ultraviolet light by a chiral solute such as 
a protein. As such, it relies on the excitation of electrons 
from occupied molecular orbitals into unoccupied 
molecular orbitals. The most widely used absorptions in 
spectroscopic studies of proteins by circular dichroism 
are the electronic absorptions of the amides of the 
polypeptide backbone between wavelengths of 180 and 
240 nm. In this region, two electronic transitions account 
for the absorption of light.'’’ One is a transition (n — 27 
in which an electron leaves one of the lone pairs on the 
acyl oxygen (designated by n) and enters the vacant anti- 
bonding z molecular orbital of the amide (designated by 
m*). This orbital is the one of highest energy of the three 
molecular orbitals of the mmolecular orbital system 
(Figure 2-3). This n — z* transition is responsible for the 
absorption of light at a wavelength of about 220 nm. The 
other transition (7° — 2*) is one in which an electron 
leaves the highest occupied nonbonding z molecular 
orbital of the amide (designated by 2°) and enters the 
vacant antibonding z molecular orbital (Figure 2-3). This 
n° — ET transition is responsible for the absorption of 
light at a wavelength of about 200 nm. 

Plane-polarized light is light characterized by an 
electric vector that oscillates within a plane. One way to 
describe plane-polarized light mathematically is to 
assume that it is produced by the sum of two electric vec- 
tors of equal amplitude emanating from a single point 
that is traveling at the speed of light in a straight line. 
While propagating, these two electric vectors spin in 
opposite directions, clockwise and counterclockwise at 
the same frequency (in revolutions second’) as the fre- 
quency of the light (Figure 12-8A).'’* The sum of these 
two spinning vectors produces an electric vector that 
oscillates in a plane containing the line of propagation to 
produce the plane-polarized light. These two compo- 
nents that spin in opposite directions around the axis 
defined by the line of propagation are called circular 
polarizations. 
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Figure 12-8: Principles of circular dichroism and optical rota- 
tion.'”” (A) View down a ray of light plane-polarized in the vertical 
direction looking from the source. The plane-polarized light can be 
considered to be the sum of two electric vectors, respectively, of 
right (R) and left (L) circularly polarized light. The electric vector of 
the plane-polarized light (M) remains fixed in orientation but oscil- 
lates in amplitude with a maximum at the point A. The electric vec- 
tors of each of the circular polarizations of the ray remain fixed in 
amplitude but circle in opposite directions at the same angular 
velocity. (B) Alteration of plane-polarized light during its passage 
through a solution containing a chiral solute. If the index of refrac- 
tion of the solution for the left circular polarization (L’) of the 
plane-polarized light is greater than that for the right circular polar- 
ization (R’), the left component will have a slower angular velocity 
than the right component, and the plane of the polarized light will 
be rotated to the right by an angle a. If the right circular polariza- 
tion (R’) of the plane-polarized light is absorbed more than the left 
circular polarization component (L’), the plane of the polarized 
light will broaden into an ellipse because when the two electric vec- 
tors are at 180° to each other (at B’), they no longer cancel. The 
ellipse is created by a composite electric vector (M’) that rotates to 
the left if the absorption of the right circular polarization is greater 
and to the right if the absorption of the left circular polarization is 
greater. The maximum of the amplitude of the harmonic oscilla- 
tion of this electric vector within the rotated plane is at the point A’ 
and its maximum in the dimension at 90° to the rotated plane is at 
the point B’. The angle 6 (Equation 12-33) is indicated. Adapted 
with permission from ref 112. Copyright 1967 Marcel Dekker. 


When these two circular polarizations encounter a 
chiral object such as one of the functional groups of a 
protein in a solution, they are absorbed and retarded 
unequally. If only the amplitude of one component were 
decreased more than the amplitude of the other while 
the two components remained in phase, the plane of 
polarization of the emerging light would retain the same 
orientation, but its electric vector, which is the sum of the 
two components, would trace in cross section an ellipse 
rather than a flat, linear segment (Figure 12-8B). If the 
only the phase between the two components were 
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shifted while their amplitudes remained the same, the 
plane of polarization of the emerging light would rotate 
by an angle a, but the electric vector would still trace in 
cross section a flat, linear segment (Figure 12-8B). The 
first effect is circular dichroism; the second effect, opti- 
cal rotation. Both effects are required to occur simulta- 
neously in any circumstance, and as polarized light is 
rotated, it necessarily becomes elliptical and vice versa. 
This obligatory connection permits the spectrum of opti- 
cal rotation as a function of wavelength to be calculated 
from the spectrum of circular dichroism as a function of 
wavelength and vice versa." 

The degree to which the emerging light has become 
elliptical can be measured, and it is expressed as an angle 


(12-33) 


where the ratio OB’/OA’ is the ratio of the minor and 
major axes of the ellipse (Figure 12-8B). The molar ellip- 
ticity at a given wavelength A, [6],, is defined by the rela- 
tionship 


0 
l [chromophore] 


[0]; = (12-34) 


where [chromophore] is the molar concentration of the 
functional group absorbing the light, referred to as the 
chromophore, and / is the path length of the sample 
chamber. By convention, the units of [6], are chosen to 
be degrees centimeter’ (decimole of chromophore). A 
circular dichroic spectrum is a display of the amplitude 
of [6], as a function of wavelength. 

The optical rotation o (Figure 12-8B) produced by 
the sample can be registered with a spectropolarimeter. 
It can also be normalized by the concentration of the 
chromophore responsible for it to produce the specific 
rotation [q@],. A spectrum of the optical rotatory disper- 
sion is simply the amplitude of [a], plotted as a function 
of the wavelength of the polarized light. Because optical 
rotation arises from a shift in the relative phases of the 
two circularly polarized components (Figure 12-8B), it is 
proportional to the derivative with respect to wavelength 
of the electronic absorption from which it arises. This has 
the practical disadvantages of both turning each peak of 
absorption into two peaks, a positive one and a negative 
one distributed around the wavelength of maximum 
absorption, and broadening the signal. In a spectrum of 
optical rotatory dispersion arising from several maxima 
of absorption, the individual components are difficult to 
resolve. 

A circular dichroic spectrum, however, is usually 
simpler to interpret. Unless excitonic coupling between 
two apposed chromophores of similar wavelengths of 
absorption is occurring, the individual bands in a circu- 
lar dichroic spectrum of a protein are unsplit peaks that 


coincide with absorption maxima in the absorption spec- 
trum of the same protein. In uncomplicated situations, 
the circular dichroic spectrum simply registers the opti- 
cal activity of each chiral contributor to the absorption 
spectrum. For example, most of the peaks in the absorp- 
tion spectrum of cytochrome cı have only one corre- 
sponding negative or positive peak at the same 
wavelength in its circular dichroic spectrum (Figure 
12-9). The peak of absorption from the pyridoxal phos- 
phate in glycine hydroxymethyltransferase at 422 nm cor- 
responds to a prominent peak of positive polarization at 
the same wavelength and of the same width in the circu- 
lar dichroic spectrum, and corresponding peaks in the 
two corresponding spectra shift to 343 nm upon the addi- 
tion of the substrate serine to the solution.” Because 
adjacent bands of absorption often have different polar- 
ities, the circular dichroic spectrum can often reveal 
details in the absorption spectrum. For example, the two 
overlapping peaks at 480 and 530 nm in the absorption 
spectrum of cytochrome-c oxidase from Thermus ther- 
mophilus correspond to peaks at the same wavelengths 
of positive polarization and negative polarization, respec- 
tively.” If excitonic coupling between two or more chro- 
mophores is occurring, however, the resulting bands in 
the circular dichroic spectrum will each be split into two 
or more components of both positive and negative ampli- 
tude, and this splitting complicates the situation.'"® 

A polypeptide folded entirely as an o helix has a cir- 
cular dichroic spectrum that is distinct from that of a 
polypeptide folded entirely in ß structure. Both of these 
spectra are distinct from that of a polypeptide unfolded 
as a random coil (Figure 12-10). In the circular 
dichroic spectrum of a polypeptide folded as an ahelix, 
the amido x° — x* transition at about 200 nm is split into 
a positive component (Ama, = 191 nm) and a negative 
component (Amax= 205 nm). This splitting arises from the 
fact that each amide is held in the same orientation rela- 
tive to the axis of the a helix.’ There is also the addi- 
tional band of negative ellipticity at 225 nm from the 
n — r* transition of the peptide bond, which, in combi- 
nation with the band of negative ellipticity at 205 nm, 
gives the circular dichroic spectrum of the «helix its 
characteristic double minimum. The z° — r* transition 
from a polypeptide in either $ structure or random coil 
is unsplit. 

The tyrosines, phenylalanines, and tryptophans in 
a polypeptide absorb light of wavelength between 180 
and 240 nm and have characteristic circular dichroic 
spectra." The contributions of the tyrosines, phenylala- 
nines, and tryptophans in a protein to its circular 
dichroic spectrum can be numerically subtracted to 
reveal the circular dichroic spectrum of the amides of 
polypeptide backbone alone. Because the polypeptide is 
the main contributor to the circular dichroic spectrum 
between wavelengths of 180 and 240 nm, the unit in 
which the molar ellipticity is usually presented is deci- 
molarity of peptide bonds (Figure 12-10). 
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Figure 12-9: Correlation between an optical absorption spectrum 
(A) and the corresponding circular dichroic spectrum (B).'" 
Cytochrome c, was purified to homogeneity from bovine heart 
muscle. Solutions of the hemoprotein were prepared in its oxidized 
Fe(II) and reduced Fe(II) forms. (A) Optical absorption 
(absorbance) of solutions of the oxidized and reduced proteins as a 
function of wavelength (nanometers). In each case, the absorbance 
at wavelengths greater than 300 nm is due entirely to the heme of 
the hemoprotein. The intense bands of absorption at 400-420 nm 
are characteristic of hemes. (B) Molar ellipticities (9 x 10%) of the 
two solutions of the same two forms of the cytochrome c, oxidized 
and reduced, as a function of wavelength. The molarities of the 
solutions were expressed as moles of peptide bond in each liter of 
solution, which seems inappropriate since the majority of the 
absorption arises from the heme. Nevertheless, the units for molar 
ellipticities are degrees centimeter’ (decimole of peptide bond)". 
For each band in the absorption spectrum, each of which has a pos- 
itive value, there is a corresponding band in the circular dichroic 
spectrum, which is either positive or negative. For example, the 
band of absorption at 350 nm in the spectrum of the absorbance of 
the reduced cytochrome cn corresponds to a band of negative 
molar ellipticity in the circular dichroic spectrum. Reprinted with 
permission from ref 113. Copyright 1971 Academic Press. 


The experimentally measured circular dichroic 
spectrum of the folded polypeptide in a protein can 
always be resolved numerically into three component 
spectra as similar as possible to those of pure œ helix, 
pure D structure, and pure random coil. If it is assumed 
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Figure 12-10: Circular dichroic spectra that are used as reference 
spectra for œ helix (dotted line), p structure (dashed line), and 
random meander (solid line).'!” Molar ellipticity, [0] x 10 [degree 
centimeter” (decimole of peptide bond)™, is presented as a func- 
tion of wavelength (nanometers). Myoglobin from Physeter 
catodon, dissolved in 0.1 M NaF at pH 7, is a protein that is almost 
entirely a-helical (Figure 4-18). It was used as a reference com- 
pound for o helix (dotted line). Poly(Lys-Leu-Lys-Leu) in 0.5 M NaF 
at pH 7 was used as a polypeptide that is purely ß structure (dashed 
line). Poly(Pro-Lys-Leu-Lys-Leu) in a salt-free solution is com- 
pletely structureless because of the prolines and the strong repul- 
sions of the lysines. It is not, however, a typical random coil 
because of both of these features. Nevertheless, it was used as a 
model for a polypeptide that is purely random meander (solid line). 
Reprinted with permission from ref 117. Copyright 1980 Academic 
Press. 


that these component spectra, obtained only by numer- 
ical analysis, do, nevertheless, accurately represent the 
contributions of o helix, p structure, and random mean- 
der to the entire spectrum, their relative amplitudes 
should provide the relative amounts of these three com- 
ponents in the actual molecule of protein.!"? 

This simple expectation is diminished by several 
difficulties. Small peptides that assume particular types 
of Bturn show significantly different circular dichroic 
spectra, that are each unique from that of a random coil, 
and dissecting out the contribution of each type of $ turn 
to the spectrum of a particular protein is difficult,” if not 
impossible. A related shortcoming is the confounding of 
random coils and random meanders. A random coil is an 
unfolded polypeptide continuously changing its struc- 
ture by rotation around its various covalent bonds. 
Random meander is the path assumed by the backbone 
of a folded polypeptide that is neither an «helix, a 
p structure, nor a p turn. Random meander is static and 
respectively identical in all of the folded polypeptides in 
a solution of a given protein. The random coil of an 
unfolded polypeptide used as the standard in circular 
dichroism, unlike the œ helix or 6 structure used as the 
standard, bears no relationship to the random meander 
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in a particular folded polypeptide. The random meander 
in a particular protein will produce a specific circular 
dichroic spectrum that is distinct from the common cir- 
cular dichroic spectrum produced by all random coils 
and also distinct from the unique spectra produced by 
random meanders in other proteins. 

Nevertheless, a least-squares method has been 
developed to fit the experimental circular dichroic spec- 
trum of a protein, from which the contributions of tyro- 
sine, tryptophan, and phenylalanine have been 
subtracted, to a calculated spectrum (Figure 12-11). 
The parameters of the fitting procedure are the fraction 
of ahelix, the fraction of structure, the fraction of 
random meander, and the fraction of each type of p turn. 
Because each measurement of molar ellipticity is based 
on decimoles of peptide bonds, it is assumed that the sum 
of these fractions is unity, that each point on the experi- 
mental spectrum is the sum of the molar ellipticity at that 
wavelength of the appropriate reference spectrum for the 
respective secondary structure multiplied by the fraction 
for that particular secondary structure, and that the ref- 
erence spectrum for random meander is that of random 
coil. The least-squares procedure gives the respective 
values for the fractions of the four types of secondary 
structure that produce a calculated curve most closely 
reproducing the experimental curve. The fractions for 
each type of secondary structure estimated in this way 
for a set of proteins agreed quite closely with the fractions 
for each type of secondary structure in the respective 
crystallographic molecular models of these proteins. 

One of the more important and informative uses of 
circular dichroism is to provide evidence that the struc- 
ture of the protein has changed under particular circum- 
stances. A conformational change is a change in the 
structure of the protein between two states of similar sta- 
bility. For example, the conformational change of aspar- 
tate carbamoyltransferase that occurs upon the binding 
of its substrates and that is detected both crystallograph- 
ically and as a change in sedimentation coefficient is also 
accompanied by significant changes in the circular 
dichroic spectrum of the protein." Such changes in cir- 
cular dichroic spectra coincident with a conformational 
change ofa protein are commonly encountered. This fact 
increases the concern over the accuracy of secondary 
structural dissections by numerical analysis of circular 
dichroic spectra because crystallographic descriptions of 
conformational changes of proteins rarely involve signif- 
icant changes in the content of ahelix, p structure, 
p turns, or random meander or changes in their disposi- 
tion over the sequence of the folded polypeptide. The 
changes in the circular dichroic spectrum of 
Na*/K*-transporting ATPase during a conformational 
change caused by binding of its substrates are consistent 
with the transformation of 7% of its amino acids from 
a helix into £ structure.'*! When crystallographic molec- 
ular models of the two conformations between which 
the homologous conformational change occurs in 
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Figure 12-11: Circular dichroic spectra!!” of (A) glyceraldehyde- 
3-phosphate dehydrogenase (phosphorylating) in 0.1M NaF, 
pH 7, and (B) subtilisin in 0.2M NaF, pH 7. Molar ellipticities, 
[0] x 107 [degree centimeter? (decimole of peptide bond)], are 
presented as a function of wavelength (nanometers). The spectra 
were either directly measured (solid lines) or duplicated (dotted 
lines) by adding together spectra for ahelix, p structure, ß turn, 
and random meander (Figure 12-10). In the procedure used to 
duplicate the experimental spectrum, it was assumed that the pro- 
teins contain only æ helix, p structure, B turn, and random mean- 
der. If fo fg, fr, and frm are the fractions of each of these secondary 
structures, it is assumed that the sum of these four numbers is 1 
and that f(0% + Jet gel + fr(O°r) + Iam(O°rm) is equal to the measured 
value of 0 at every wavelength, where the 0° values are the molar 
ellipticities of the standard curves (Figure 12-10) at the same wave- 
length. A least-squares method was used to obtain the best values 
for fo tp fr and frm, and these four values were then used to con- 
struct the calculated curves presented in the panels. Note that fy, fp, 
fi, and fou are parameters determined only by the structure of the 
protein and must have the same values for all wavelengths. For the 
spectrum of glyceraldehyde-3-phosphate dehydrogenase (phos- 
phorylating), the best values of fọ tp fr and fry were 0.31, 0.30, 0.22, 
and 0.17; for the spectrum of subtilisin, 0.30, 0.21, 0.21, and 0.28. 
Reprinted with permission from ref 117. Copyright 1980 Academic 
Press. 


Ca**-transporting ATPase are examined,*“**” the con- 


tent of o helix does change in the correct direction, but 
only by 2-3%. This rather small change in the amount of 
æ helix is more consistent with the absence of a measur- 
able shift in the amide I absorption in the infrared spec- 
trum under the same circumstances.” 


As noted previously, most short peptides are struc- 
tureless in water. The formation of œ helices by those 
peptides synthesized to promote this secondary struc- 
ture is routinely monitored by circular dichroism. It is 
also possible, by difference circular dichroic spec- 
troscopy, to follow the assumption of a fixed structure by 
an otherwise structureless peptide when it binds to a 
protein.” 

Ultraviolet absorption spectra of proteins at wave- 
lengths greater than 240 nm are dominated by the 
absorption of phenylalanine (Amax = 258 nm; &sg = 197), 
cystine Leen = 280), tyrosine (Ana = 275 nm; £275 = 1420), 
and tryptophan (Ama = 280 nm; &g = 5600). 13124 
Tryptophan has the largest extinction coefficient and 
longest wavelength of maximum absorbance. Because of 
the strong absorption of tryptophan, the spectra of most 
proteins between 260 and 310 nm have the same shape 
as the spectrum of tryptophan alone with its maximum 
at 280 nm and its pronounced shoulder at 289 nm. 
Proteins with little or no tryptophan, however, have 
maxima of absorption shifted toward or coincident with 
the 275 nm maximum characteristic of tyrosine. The 
absorption of a protein at its particular maximum, some- 
where between 275 and 280 nm, when properly cor- 
rected for the absorption due to the scattering of light by 
the solution, can be used as a rapid measurement of its 
concentration. Proteins that are posttranslationally 
modified with chromophores such as flavin, pyridoxal 
phosphate, or heme or bind noncovalently chro- 
mophores such as flavin, heme, metallic cations, coen- 
zyme bız, chlorophyll, pheophytin, or carotenoid, display 
absorption spectra that are characteristic of those chro- 
mophores (Figure 12-9). If one or more of the accessible 
tyrosines on the protein have been nitrated, their absorp- 
tion spectra are shifted into the visible range (Amax = 
430nm) and their acid dissociation constants are 
increased (pK, = 6.5), so that at neutral pH they are pres- 
ent mainly as the nitrophenolate, which absorbs 
strongly.” 

Either tryptophan or nitrotyrosine can be used as a 
spectral reporter group, the spectrum of which registers 
its environment or can monitor a conformational change 
of the protein.” For example, the absorption spectrum 
of nitrated Tyrosine 115 in micrococcal nuclease indi- 
cates that it is in a nonpolar environment in the absence 
of substrate but a polar environment in the presence of 
substrates.” This change in environment is also reflected 
in its accessibility to nitration by tetranitromethane. The 
conformational change of aspartate carbamoyltrans- 
ferase that occurs on the binding of substrates can be 
detected by an upfield shift in the wavelength of the 
absorption of tryptophans in the protein’ or of nitrated 
tyrosine side chains in its regulatory ß subunits.’ 

In addition to absorbing ultraviolet light, trypto- 
phan is also both fluorescent and phosphorescent. The 
wavelength of the maximum emission of fluorescence 
from tryptophan varies from 300 to 350 nm.” The emis- 
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sion of fluorescence from indole itself varies between 
these limits systematically as a function of the polarity of 
the solvent; the more polar the solvent, the longer the 
wavelength of the emission, and the tryptophans in a 
protein that are the more buried display shorter wave- 
lengths of maximum emission.'” Consequently, the 
wavelength of its emission is used as a measure of the 
degree to which a tryptophan is buried within a protein. 
In the case of the absorption spectrum of tryptophan as 
opposed to its emission spectrum, the situation is 
reversed. The most buried tryptophans, in the most non- 
polar environments, have been found to absorb light of 
the longest wavelength, on the red edge of the absorption 
band for tryptophan in the ultraviolet.'” 

If a protein contains no posttranslationally or 
experimentally added chromophore, the emission of flu- 
orescence from the protein will be dominated by that of 
its tryptophans. By the systematic removal of its trypto- 
phans through site-directed mutation, the contribution 
of each of them to the total emission of fluorescence 
from the protein can be ascertained.'”"'” A tryptophan 
can also be inserted into a particular location in a protein 
by site-directed mutation to monitor local changes in 
conformation. 

The fluorescence from each of the tryptophans in a 
protein displays a characteristic wavelength of maxi- 
mum emission and a characteristic intensity.” When a 
protein is unfolded, its emission of fluorescence usually 
shifts to longer wavelengths as its buried tryptophans 
become exposed, IJ but the intensity of the fluores- 
cence can either increase”! or decrease.’ These 
changes can be followed for individual tryptophans in 
appropriate mutants. H! Because all of the tryptophans in 
a protein are fully exposed to solvent upon unfolding, 
this observation states that the enclosing of a tryptophan 
by the native structure can either quench or enhance its 
fluorescence. Consequently, it is the particularity of the 
local environment around each tryptophan in the native 
protein that governs the intensity of its emission. For 
example, Tryptophan 94 in folded, native ribonuclease 
from Bacillus amyloliquifaciens has very little emission 
of fluorescence because one of its immediate neighbors 
is Histidine 18, which strongly quenches it." The lowest 
intensity of emission (by a factor of greater than 3-fold) 
from the three tryptophans in lysozyme from T4 bacte- 
riophage is that of Tryptophan 158, which is surrounded 
by a cystine and two methionines, the sulfurs of which 
are also efficient quenchers.” The amides of glutamine 
and asparagine are also efficient quenchers. This sensi- 
tivity to the particularity of the surroundings explains 
why tryptophans with the shortest wavelengths of maxi- 
mum emission are not always the ones with the lowest 
intensity of emission.'” 

If intersystem crossing is not significant, the excited 
state of a fluorescent functional group such as trypto- 
phan can decay to the ground state by at least four sepa- 
rate pathways:'” 
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: ky 
excited state —— ground state (12-35) 
kp 
excited state —— ground state + hv (12-36) 


k 
; Q 
excited state + quencher —— 


ground state + excited quencher 
(12-37) 


k 
T 
excited state + acceptor —— 


ground state + excited acceptor 
(12-38) 


where the k; are the rate constants for the respective 
processes. Equation 12-35 describes a radiationless 
decay of the energy of excitation through migration 
among rotational and vibrational energy levels or other 
piecemeal transfers to its surroundings as heat. Equation 
12-36 describes the release of a portion of the energy of 
excitation as a photon of fluorescent light. Equation 
12-37 describes the transfer of the energy of excitation to 
another molecule, the quencher. Although there are 
some molecules or ions, in particular those containing 
an unpaired electron, that can quench at distances 
beyond their van der Waals radii, most quenchers must 
collide with the fluorescent functional group when it is in 
the excited state to quench it. Equation 12-38 describes 
the radiationless transfer of the energy of excitation 
through space by resonance between the excited state 
and a nearby functional group capable of absorbing the 
energy. The excited electron is the donor, and the func- 
tional group to which the excitation is transferred is the 
acceptor. 

The quantum yield, Q), of a fluorescent chro- 
mophore is the number of photons appearing as fluores- 
cence for every photon absorbed. When neither 
quencher nor acceptor is present,” 


kr 


—— 12-39 
ky + kp ( ) 


Qo = 


= kp ta 


The time over which 50% of the excited state dis- 
appears, or the half-life of the excited state, would be 
(In 2) (ki, + El, but the lifetime of the fluorescence, 7), is 
defined as (k, + kp)", the time in which the intensity 
decreases to exp(-1) of the initial intensity. 

If a collisional quencher is added to the solution, it 
affects the quantum yield and lifetime of the excited state 
because every time a molecule of quencher collides with 
a molecule of excited state, there is a specific probability 
that the excitation energy will be transferred from the 


excited state to the quencher. Each quencher has a 
unique efficiency for quenching a particular fluorescent 
chromophore. The result of this transfer of energy upon 
collision is that!” the quantum yield of the fluorescent 
chromophore in the presence of the quencher, Ou, is 
decreased: 


kr 
= 12-40 
Qo ky + kp + ko[quencher] ! l 


and 


Qo 8 1 
Qo 14%, kg [quencher] SECH 


The observed lifetime of the quenched fluorescence, To, is 


T 
To = 2 (12-42) 
1 + To kg [quencher] 


Phosphorescence, which is simply fluorescence from a 
triplet state, is also subject to quenching. 

The ratio Fo/Fy is the ratio between the fluores- 
cence observed in the presence of the quencher and the 
fluorescence observed in its absence. This ratio is neces- 
sarily equal to Qg/Q. The ratio Fo/F, can be readily 
measured with a fluorometer. When its reciprocal, Fo/ Fo, 
is plotted as a function of [quencher], a linear relation- 
ship is obtained, the slope of which is equal to Kaze 
(Figure 12-12)."” The bimolecular rate constant kg for 
the collision of the quencher with the fluorescent func- 
tional group on a protein is the slope of this line divided 
by the lifetime t of the fluorescence of the unquenched 
excited state. 

The fluorescence and phosphorescence of trypto- 
phan can be quenched by large inorganic anions, such as 
I and NOT: by molecular oxygen; by unsaturated 
amides, such as acrylamide; and by ketones, such as 
2-oxobutane.'?”?0137138 The bimolecular rate constant 
ko for the quenching reflects the accessibility of the 
quencher to the tryptophans. If If the quencher is a polar 
molecule confined to the aqueous phase, then the 
greater the rate constant for quenching, the more 
exposed is the tryptophan to that phase. On the basis of 
the observed rate constants kg for polar quenchers, the 
tryptophans in most proteins can be divided into three 
Classes 

The tryptophans in the first class are fully accessi- 
ble to the aqueous phase, and their fluorescence is read- 
ily quenched. The rate constants for the collisions 
between their singlet excited states and various polar 
quenchers are 2-10 M™ ns as expected from a diffu- 
sion-controlled process. 

The fluorescence of the tryptophans in the third 


class cannot be quenched (kg < 0.01 M“ ne UD because 


no collisions with the quenchers can occur within the 
lifetime of their excited states. Presumably, this is due to 
the fact that they are buried. This conclusion follows 
from the facts that such unquenchable tryptophans are 
observed in proteins that contain buried tryptophans in 
their crystallographic molecular models. For example, 
Tryptophan 138 in lysozyme from bacteriophage T4 is 
poorly quenched (kg = 0.009 MT ns”), and it is well 
buried in the molecular model of the protein.'*® Also, 
such unquenchable tryptophans are optimally excited by 
light of a longer wavelength.'” The phosphorescence of 
these buried tryptophans, however, has a sufficiently 
long lifetime (Tọ = 1s) that it can be quenched. The 
bimolecular rate constants kg for the quenching of the 
phosphorescence of the tryptophans in the buried class 
are relatively small (<0.001 MT ns”), and the quenching 
registered by these rate constants seems to result from 
extensive and momentary unfoldings of the folded 
polypeptide that occasionally provide access to the inte- 
rior, but only for a short time.™™ These observations pro- 
vide support for the concept that most parts of a protein 
are conformationally active and continuously expand 
and contract. 

The intermediate, second class of tryptophans, and 
probably the most numerous, are those that are partially 
buried and have intermediate rate constants for 
quenching (0.01-2 MI ns”). Examples are Tryptophan 
59 in ribonuclease Tl from Aspergillus oryzae 
(kg = 0.3 MI ns”), Tryptophan 126 from lysozyme of 
T4 bacteriophage (kg = 0.3 VI nell and Tryptophan 
333 of phosphoglycerate kinase from Saccharomyces 
cerevisiae (kg = 0.8 M? nell 

Oxygen provides an interesting exception to the 
behavior of most quenchers. The difference in its ability 
to quench accessible and buried tryptophans is much 
less than that observed with larger more polar quenchers 
(Figure 12-12).!*°49 On the basis of this observation, it 
has been proposed that oxygen is small enough to insin- 
uate its way through a molecule of protein in liquidlike 
diffusion among the tightly packed amino acids. 

Changes in the accessibility of tryptophans to polar 
quenchers dissolved in the aqueous phase have been 
used to monitor conformational changes in the structure 
of a protein. In the case of succinate-CoA ligase (ADP- 
forming) from E coli, the binding of ATP to the a subunit 
of the enzyme causes significant decreases in the acces- 
sibility of the tryptophans in the D subunit to acrylamide 
dissolved in the solution.’ This suggests that a confor- 
mational change propagated throughout the whole pro- 
tein occurs upon the binding of ATP. The implication 
that both the o and f subunits change their structure in 
concert when ATP binds is consistent with the observa- 
tion that they are intimately associated in the oligomeric 
structure of the protein (Figure 8-22). To follow the 
conformational change that occurs when the carboxy- 
terminal portion of colicin El from E coli inserts into a 
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Figure 12-12: Collisional quenching of the fluorescence of trypto- 
phans in several proteins by oxygen.” Solutions of the various pro- 
teins were placed in cuvettes in a fluorometer and excited with light 
of wavelength 280 nm. Fluorescence at 90° to the exciting beam 
was monitored at the wavelength of maximum emission for each 
protein (325-350 nm). The high concentrations of oxygen (molar) 
were produced by enclosing the cuvette in a chamber that could be 
pressurized to 105 kg cm" O, gas and allowing the gas to equili- 
brate with the solution at various pressures. The proteins used were 
bovine a-chymotrypsin (™), rabbit fructose-bisphosphate aldo- 
lase (0), bovine immunoglobulin G (ei, and bovine serum albu- 
min (A). A solution of tryptophan (O) was used as an example of a 
fully exposed side chain. Lines were drawn on the basis of the 
expectation that FoF" as a function of [quencher] would be linear 
(Equation 12-41). Reprinted with permission from ref 129. 
Copyright 1973 American Chemical Society. 


membrane, tryptophans were inserted at various posi- 
tions in its amino acid sequence by site-directed muta- 
tion, and changes in the rate constants of quenching for 
these tryptophans were measured before and after inser- 
tion.’ 

It is also possible for the energy of the relaxed 
excited state in excess over the energy of the ground 
state, which otherwise would be emitted as fluorescence, 
to be transferred intact through space by resonance to 
another chromophore in a radiationless process 
(Equation 12-38). This fluorescence resonance energy 
transfer (FRET) discharges the electronically excited 
state of the functional group that originally absorbed the 
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photon, the donor, and produces an electronically 
excited state in the functional group that receives the 
energy, the acceptor. Because this transfer of energy 
between donor and acceptor occurs by resonance, there 
must be a matching of energy and a matching of orienta- 
tion between donor and acceptor. The energies are 
matched in the region of overlap between the absorption 
spectrum of the acceptor and the emission spectrum of 
the donor; the greater the overlap, the greater the proba- 
bility that the energy will be transferred. The orientations 
are matched in the coincidence between the orientation 
of the transition dipole of the donor and the transition 
dipole of the acceptor; the greater the coincidence, the 
greater the probability that the energy will be transferred. 
If the new excited state of the acceptor created by 
this transfer normally returns to its respective ground 
state by radiationless processes—in other words, if its 
quantum yield is zero—no fluorescent photon is emitted, 
and the only observations made are that the fluorescence 
of the donor is quenched and its lifetime is decreased. If 
the acceptor is also a fluorescent functional group, its 
excited state will release a fluorescent photon, consistent 
with its quantum yield. The photon released from the 
acceptor, however, will be of even longer wavelength 
than the one that would have been released from the 
donor (Figure 12-13)'* because the excited state of the 
acceptor immediately following the transfer relaxes to a 
stable excited state, the energy of which, relative to the 
ground state of the acceptor, is less than the energy that 
was passed from donor to acceptor during the transfer. 
At those wavelengths in the emission spectrum of the 
donor that do not overlap the emission spectrum of the 
acceptor and are therefore not contaminated by the flu- 
orescent emission of the acceptor, the fluorescence 
measured in the presence of the acceptor will be less 
than the fluorescence measured in its absence. The 
observed, uncontaminated fluorescence of the donor 
will be quenched in the presence of the acceptor by the 
ratio Q4/ Qo. 

Suppose that a molecule of protein has a fluores- 
cent chromophore that can act as a donor covalently 
attached or noncovalently bound to a particular location 
in its tertiary structure and a different chromophore that 
can act as an acceptor covalently attached or noncova- 
lently bound to a different location. The various fluores- 
cent compounds used to modify rhodopsin (Figure 
12-14)' are typical of the donors and acceptors that are 
covalently or noncovalently attached to specific loca- 
tions in a protein. In such a situation, the rate of the 
transfer of energy from the excited donor to the acceptor 
by resonance (Equation 12-38) is equal to kyf,lexcited 
donor], where fa is the fraction of the sites on the protein 
for the acceptor that are occupied and [excited donor] is 
the molar concentration of the excited donor. As the 
acceptors are fixed to the molecules of protein that con- 
tain an excited donor at a constant fractional occupancy, 
the pseudo-first-order rate constant for the decay in the 
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Figure 12-13: Absorption spectra and emission spectra (inset) of 
1-acetyl-4-(1-naphthyl)semicarbazide (solid lines), a typical fluo- 
rescent donor for observing transfer of energy by resonance, and a 
matched acceptor, dansyl-L-prolylhydrazide (dashed lines), both 
dissolved in ethanol.'“* The measurements of absorbance are made 
by monitoring the intensity of the light of continuously varied 
wavelength that passes through each solution. The measurements 
of emission are made by following the intensity of the light emitted 
at 90° to an incident beam as a function of wavelength while the 
chromophore is excited with light of wavelength equal to that of its 
maximum of absorption. The spectrum of the amount of light 
absorbed is expressed as an extinction coefficient (centimeter? 
micromole™) as a function of wavelength (nanometers). The 
amount of light emitted is expressed as fluorescence (in relative 
units) as a function of wavelength (nanometers). When donor and 
acceptor are located near each other, a portion of the excited states 
of the donor would have their energy of emission at 350 nm trans- 
ferred radiationlessly by resonance to the overlapping absorption 
band of the acceptor, and this transfer would quench the fluores- 
cence of the donor. The transferred energy would be emitted as flu- 
orescence at 540nm from the excited acceptors. Adapted with 
permission from ref 144. Copyright 1967 National Academy of 
Sciences. 


concentration of excited donor through the transfer of 
energy to the acceptor is kıfı. 
The efficiency of transfer Er is defined as 


8 E, 
key & kp+ krfa 


Ey (12-43) 


This is the fraction of the decay of the excited state due to 
transfer of energy by resonance, or the ratio between the 
quanta transferred and the quanta absorbed by the 
donor.’ The quantum yield Q, of the fluorescence of 
the donor in the presence of the acceptor is governed by 
a relationship analogous to Equation 12-40 
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Figure 12-14: Fluorescent, electrophilic reagents used to modify, covalently or noncovalently, three sites on rhodopsin." 
N-[(Todoacetamido)ethyl]-1-aminonaphthalene-5-sulfonate anion 12-1 (Abs = 350 nm; Agmit = 495 nm), N-[(iodoacetamido)ethyl]-1-amino- 
naphthalene-8-sulfonate anion 12-2 (Abs = 350 nm; Ami: = 495 nm), and 5-(iodoacetamido)salicylate anion 12-3 (Abs = 323 nm; 
Aemit = 405 nm) were used to modify a particular cysteine in the protein by alkylation. N,N’-Bis[{1-(dimethylamino)naphthalene-5-sulfonato] - 
L-cystine 12-4 (Aabs = 350 nm; Aemit = 520 nm) and N,N’-bis[fluoresceinyl(isothiocarbamido)]cystamine 12-5 (Aabs = 495 nm; Aemit = 518 nm) 
were used to modify a different cysteine in the protein by disulfide exchange. 9-Hydrazinoacridine 12-6 (A,,, = 440 nm; Agmit = 470 nm) and 
proflavin 12-7 (A,p,=470 nm; Amit = 512 nm) were used as ligands for a particular site on the protein with a high affinity for aromatic cations. 
All wavelengths (A) are wavelengths of maximum absorption or maximum emission. In each instance the fluorescent functional group selec- 
tively attached to the protein was used as a donor of resonant energy to 11-cis-retinal, a natural, covalent posttranslational modification 


(Table 3-1) of the protein that absorbs maximally at 500 nm. 
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where F,/Fy is the ratio between the fluorescence 
observed in the presence and that observed in the 
absence of the acceptor. 

The lifetime of the excited state 7, in the presence of 
acceptor is governed by a relationship analogous to 
Equation 12-42 
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The efficiency of transfer can be assessed by measuring 
either the decrease in steady-state fluorescence (F,4/Fo; 
Equation 12-45) or the decrease in the lifetime of the 
excited state produced by the acceptor (7,/7y; Equation 
12-47), but the latter measurement is more accurate than 
the former. The efficiency is calculated from the emis- 
sion or the lifetime of the donor in the presence of the 
acceptor and the emission or the lifetime of the donor in 
the protein that has not been modified with the acceptor 
or in which the acceptor has been bleached.’ 

The efficiency of the transfer of energy, Er, is deter- 
mined by the distance r between the center of the transi- 
tion dipole of the donor and the center of the transition 
dipole of the acceptor by the relationship 


Ae 
Er = — 5 (12-48) 
far + Ri? 


where R, is the distance at which the efficiency of 
transfer would be 50% if the site for the acceptor 
were fully occupied. Consequently, the distance 
between the centers of the dipoles of the donor and the 
acceptor 
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where Q, is the quantum yield of the donor (dimension- 
less), ñ is the refractive index of the medium between the 
donor and the acceptor (dimensionless), and N, is 
Avogadro’s number (moles). The overlap integral J 
(centimeters? mole™')'**!“° is that between the fluores- 
cence emission spectrum /(A) of the donor (in relative 
units) and the spectrum of the extinction coefficient e (A) 
of the acceptor (in liters mole centimeters) normal- 
ized by the total fluorescence of the donor 


fra e(ayataa 
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where A is the wavelength.* This integral quantifies the 
match between the energies of the donor and acceptor 
required for the resonance. The integral J is calculated 
numerically from the absorption spectrum of the accep- 
tor and the emission spectrum of the donor (Figure 
12-13), which should be gathered from donor and accep- 
tor when they are in solution attached individually to the 
protein. 

The orientation factor K* (dimensionless) is 
defined by 


K? = (cos, - 3cos 9) cos 04) (12-52) 


where 6, is the angle between the transition dipoles of 
donor and acceptor and 6p and 9, are the angles between 
the transition dipoles of the donor and acceptor, respec- 
tively, and the vector between the centers of those 
dipoles." The transfer of energy is between these transi- 
tion dipoles of the donor and acceptor, and this factor 
quantifies the match between the orientations of the 
donor and acceptor required for the resonance. If the 
orientations of the transition dipoles are not fixed 


* In the equation presented by Latt et al.,'“° there is a misleading 
factor of 1000 that serves to correct liters mole’ to centimeters 
mole”, a correction that would automatically be made during the 
cancellation of units. This is an excellent example of the absolute 
necessity of including units and making sure that they cancel 
properly whenever any calculation is performed in the physical 
sciences. 


because the chromophores are free to adopt a number of 
different orientations, K* is the average of Equation 
12-52 over those orientations. 

The first requirement for every application of the 
transfer of energy by resonance is to have a protein in 
which a donor of energy and an acceptor of that energy 
are both located at defined positions within its structure. 
It turns out that measuring the transfer of energy is easy 
but placing the donors and acceptors at unique and 
exclusive locations on the protein is difficult. Often, 
either an intrinsic donor or an intrinsic acceptor or 
both, placed by evolution either covalently or noncova- 
lently at a unique location on the protein, is relied upon 
to circumvent at least half of the difficulty. 

Examples of such evolutionarily positioned donors 
and acceptors are fluorescent substrates or fluorescent 
analogues of substrates that bind to the active site of an 
enzyme, fluorescent ligands that bind to a specific site on 
a protein, and posttranslationally incorporated func- 
tional groups such as coenzymes that happen to be fluo- 
rescent or transition metal ions the complexes of which 
absorb at convenient wavelengths.'”” Tryptophans are 
often used as donors of resonant energy. Those found 
naturally in the protein can be used one by one as unique 
donors by preparing the respective site-directed 
mutants, each of which retains only one of them.'” 
Nitrotyrosine’”’ or kynurenine,'” a covalent modifica- 
tion of tryptophan, can be used as acceptors of resonant 
energy from a tryptophan. 

Often cysteines in the protein are used as conven- 
ient nucleophiles to be covalently modified by fluores- 
cent electrophilic reagents (Figure 12-14). Sometimes, 
one of the cysteines in a protein, because of its peculiar 
reactivity, can be selectively modified with one fluores- 
cent reagent and another cysteine can then be modified 
with another.'”*'* Cysteines can be placed at specific 
positions in a protein by site-directed mutation and then 
selectively modified with appropriate fluorescent 
reagents.” A fluorescent reagent can be attached to a 
specific glutamine on the surface of a protein by use of 
the enzyme protein-glutamine y-glutamyltransferase, 
which exchanges the ammonia of the glutamine with a 
primary amine on the reagent.” It is also possible to 
use synthetically produced fluorescent o amino acids 
and in vitro systems for incorporating unnatural amino 
acids at specific positions in its amino acid sequence to 
produce a protein with a fluorescent functional group 
located at a single, designated point in its native struc- 
ture, 157-188 

The bilayer of phospholipid in which membrane- 
bound proteins are embedded also provides a location in 
which to locate a fluorescent donor or acceptor. The 
bilayer can be turned into a sheet of fluorescent donors 
or fluorescent acceptors by dissolving hydrophobic fluo- 
rophores’”™ in the liquid hydrocarbon at its center or 
by covalently attaching fluorophores to the phospho- 
lipids from which it is formed.'®’ Because the bilayer 


forms a sheet of hydrocarbon and because the molecules 
of a particular protein all float at the same depth within 
this sheet of hydrocarbon, the molecules of donor or 
acceptor dissolved within it end up in a fixed location rel- 
ative to the rest of the protein. A matched acceptor or 
donor, respectively, can then be attached covalently to a 
specific location on the protein, and transfer of energy 
between the molecules of donor or acceptor within the 
bilayer and the acceptor or donor on the protein can be 
monitored. 

The orientation factor K? is the most uncertain 
parameter in Equation 12-50.’ In any given situation, K? 
has a specific numerical value but its value cannot be 
measured directly. If both donor and acceptor were free 
enough to assume all possible relative orientations with 
equal probability, E" would be %4.'* If one of the two were 
fixed and the other could assume all possible relative ori- 
entations, K? would have a value between % and %.'* If 
both donor and acceptor, however, are fixed in their rela- 
tive orientations, for example, both rigidly bound to a mol- 
ecule of protein, K* can have a value anywhere between 0 
and 4.0. Because Ro, and hence r, depends for its value on 
(K?)%, the uncertainty of K 2 affects the value of r by more 
than +12% only when it is greater than % but far more 
dramatically when it is less than 16. 

The more freedom the donor, the acceptor, or both 
of them have to assume different orientations by rotation 
around unhindered bonds between them and the rigid 
portion of the protein, the closer the value of K* comes to 
74. An estimate of the orientational freedom of donor or 
acceptor can be made from the rate and extent of the 
depolarization of its fluorescent emission. If either the 
donor or acceptor is excited with linearly polarized light, 
the light emitted as fluorescence immediately, that is, 
before the chromophore has had time to reorient, will 
also be polarized. The polarity of the emitted light, how- 
ever, will decay over the lifetime of the excited state as it 
reorients. The rate of this decay and the final residual 
polarity of its fluorescence provide an estimate of the ori- 
entational freedom of the donor or acceptor. 

From these estimates of orientational freedom, a 
distribution of the probability for particular values of K* 
can be calculated.’ For example, from the depolariza- 
tion of the fluorescence from the 5-(N,N-dimethyl- 
amino)naphthlenesulfonyl group attached to rhodopsin 
as a donor to the retinal rigidly fixed in the center of the 
protein, the value for K * could be estimated to fall 
between 0.08 and 1.8 with a confidence of 90%," and 
from the depolarization of the fluorescence from the 
pyrene group attached covalently as a donor to the active 
site of acetylcholinesterase and the depolarization of the 
fluorescence from the propidium bound as an acceptor 
at another site on the protein, the value for K* could be 
estimated to fall between 0.25 and 3.3. For donor or 
acceptor or both to display depolarization, however, they 
must be reorienting fairly freely anyway, and such esti- 
mates of ranges for K* may not be significantly more 
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accurate than simply using % for the value of K,. For 
example, predicted distances between four pairs of 
donors and acceptors positioned on specific amino acids 
in phosphoglycerate kinase from yeast, a protein for 
which a crystallographic molecular model is available, 
were no more reliable when K* was estimated from 
depolarizations than when % was used for K?.'” 

If both donor and acceptor are rigidly bound by the 
protein at a fixed orientation relative to each other, then 
neither will have any orientational freedom and no limits 
can be placed on K* other than from 0 to A7 For exam- 
ple, in the crystallographic molecular model of deoxyri- 
bodipyrimidine photo-lyase from Anacystis nidulans, the 
angle between the transition dipoles of the flavin ade- 
nine dinucleotide and the 8-hydroxy-5-deazaflavin 
bound to the protein is 36°. From this angle and the 
angles of the dipoles to the vector between the two chro- 
mophores (Equation 12-52), a value of K ? of 1.6 could be 
calculated.'™ For the same protein from E. coli, however, 
the angle between the transition dipoles of the flavin 
adenine dinucleotide and the methylene tetrahydrofolic 
acid, which takes the place of the 8-hydroxy- 
5-deazaflavin in the latter crystallographic molecular 
model, is almost 90°, causing K 2 to be almost 0. Even 
though the distances between the chromophores in 
these two proteins are the same (1.7 nm), the efficiency 
of the transfer of energy by resonance for the protein 
from A. nidulans is 97% while that from E. coli is 62%. 
When K? of % was used to calculate the distance between 
the flavin adenine dinucleotide and the tetrahydrofolic 
acid in the protein from E. coli, in the absence of a crys- 
tallographic molecular model, the value obtained was 
2.2 nm instead of the actual distance of 1.7 nm.'® 

The efficiency of the transfer of energy by reso- 
nance between Tyrosine 14 and Tyrosine 55 in steroid 
A-isomerase from Pseudomonas testosteroni is less than 
25% even though these tyrosines are only 0.6 nm apart in 
the crystallographic molecular model. Consequently, K* 
must be less than 0.003, a fact from which it was con- 
cluded that these two tyrosines were held rigidly by the 
protein in a relative orientation incompatible with effi- 
cient transfer even over such a small distance.’ Had a 
value of % been used for K? in the calculation, the dis- 
tance between these two tyrosines would have been esti- 
mated to be greater than 1.5 nm. 

The original enthusiasm for measurements of the 
transfer of energy by resonance was the potential it 
offered for measuring the distance between two loca- 
tions in a protein the crystallographic molecular model 
for which is not available II? For example, if the donor 
were attached by the unique stoichiometric covalent 
modification of a particular amino acid in the sequence 
of the protein and the acceptor were specifically attached 
by the unique modification of another amino acid in the 
sequence, it would be possible to estimate the distance 
between the donor and acceptor in the folded polypep- 
tide and hence the distance between the two modified 
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amino acids. In support of this intention, Equations 
12-48 and 12-50 have been shown to be consistent with 
the observed transfers of resonant energy between a 
donor and an acceptor at the two ends of short synthetic 
peptides of proline.'“* The distance between donor and 
acceptor was varied by varying the number of prolines in 
the peptides to demonstrate that the dependence of effi- 
ciency upon distance was as the sixth power. If K? was 
assumed to be %, the calculated distances between 
donor and acceptor agreed fairly well (within 25%) with 
the distances measured from molecular models of these 
modified peptides. Many estimates of distances between 
locations in proteins have been made from measure- 
ments of the transfer of energy by resonance. 

One way to evaluate the reliability of such esti- 
mates is to compare a distance estimated in this way 
with the distance observed in a subsequently obtained 
crystallographic molecular model. The distance between 
Cysteine 199 and Cysteine 343 in cyclic AMP-dependent 
protein kinase was estimated to be 3.1-5.2 nm on the 
basis of the transfer of energy by resonance between two 
different pairs of donor and acceptor,'” but in the sub- 
sequently reported crystallographic molecular model!” 
the distance between the sulfurs of these two cysteines is 
only 2.12 nm. The distance between the two Cysteines 
283 in dimeric creatine kinase from rabbit muscle was 
estimated to be 4.8-6.0 nm from measurements of the 
efficiency of transfer for five different pairs of donor and 
acceptor,” but in the subsequently reported crystallo- 
graphic molecular model of the protein,’™ these cys- 
teines are only 3.33 nm apart. The distance between 
Lysine 84 on one of the subunits in an o; catalytic trimer 
and the closest Lysine 84 on a subunit in the other œ; cat- 
alytic trimer in aspartate carbamoyltransferase (Figure 
9-37) was estimated to be 3.3 nm on the basis of transfer 
of energy by resonance between a pyridoxamine phos- 
phate and a pyridoxal phosphate attached to the respec- 
tive side chains.'””!”! This estimate conveniently splits 
the difference between the distances of 2.1 and 3.8 nm 
observed in subsequent crystallographic molecular 
models of the two respective conformations of the pro- 
tein. '”* The distance between the binding site for 
acetylcholine on acetylcholine receptor and the bilayer 
of phospholipids of the membrane in which it is located 
was estimated to be 3.0-4.0 nm from measurements of 
the transfer of energy by resonance between a donor 
covalently attached to choline and two different accep- 
tors dissolved in the hydrocarbon of the bilayer,‘ and 
the distance to the closest surface of the bilayer esti- 
mated crystallographically is 3.0 nm.'” 

A more suspect evaluation of the reliability of dis- 
tances estimated from the transfer of energy by reso- 
nance are comparisons of them with those observed in a 
crystallographic molecular model available at the time 
the measurements were made. The distances estimated 
from the transfer of energy to chloramphenicol bound at 
the active site of chloramphenicol O-acetyltransferase 


from Tryptophan 86 and Tryptophan 152, respectively, 
were both 1.5 nm when E" was set at 24, and the dis- 
tances in the crystallographic molecular model are 1.72 
and 1.66 nm, respectively.'” The distances estimated 
from the transfer of energy between the amino terminus 
and Lysines 15, 26, 41, and 46 in bovine pancreatic 
trypsin inhibitor were 3.4, 2.2, 2.1, and 2.3 nm, respec- 
tively, and the distances in the crystallographic molecu- 
lar model are 3.17, 1.68, 1.80, and 2.17nm, 
respectively.” The distance between Tyrosine 99 and 
Tyrosine 138 in the complex between calmodulin and 
four calcium ions was estimated from the transfer of 
energy to be between 1.4 and 1.9 nm," and the distance 
in the crystallographic molecular model is 1.2 nm.” The 
distance between a cysteine substituted for 
Phenylalanine 239 and Cysteine 343 in cyclic-AMP 
dependent protein kinase was estimated from the trans- 
fer of energy to be 4.1 nm,” and the distance in the crys- 
tallographic molecular model is 3.7 nm.!” 

Aside from the uncertainty of the values of K ? one 
of the main difficulties in measuring distances by trans- 
fer of energy is that the donors and acceptors are often 
attached covalently to the protein by using reagents that 
end up placing the chromophore on a flexible tether a 
significant distance from the amino acid to which it is 
attached. For example, the reagents used to modify 
rhodopsin (Figure 12-14) place the centers of the chro- 
mophores 0.4-1.2nm away from the electrophilic 
carbon or sulfur that is directly attached to the nucle- 
ophilic amino acid that has been modified. The fluores- 
cent (5-sulfonaphthalen-1-yl) amino group and the 
fluorescent (7-nitrobenz-2-oxa-1,3-diazol-4-yl) amino 
group that were used as donor and acceptor, respec- 
tively, in estimates of distances between locations in the 
complex between DNA, deoxymononucleotide, and the 
Klenow fragment of DNA-directed DNA polymerase 
from E. coli, were attached to various atoms in the com- 
plex by tethers that were each about 1.2 nm in length.’® 
If both donor and acceptor are attached through such 
long tethers, the distance between them can be signifi- 
cantly different from the actual distance between the two 
amino acids to which they are attached. One approach to 
adjusting the estimates of distance for these added 
lengths is to assume that the fluorescent functional 
group and its tether extend unrestrained outwards from 
the surface of the protein and correct for this extra dis- 
tance geometrically.'®' The difficulty with this approach 
is that the fluorescent functional group may adsorb to 
the surface of the protein or insert into a crevice on the 
surface.’ 

Many of the failures of the measurements of distance 
to agree very closely with subsequent or even prior crys- 
tallographic determinations may result from technical 
shortcomings. Too frequently the necessary parameters 
such as quantum yield and spectral overlap are not meas- 
ured directly but are based on prior published values. 
Measurements in the ultraviolet are often compromised 


by contaminants in the solutions. It has already been men- 
tioned that steady-state measurements of fluorescence are 
much less accurate than direct measurements of lifetimes 
of the fluorescence. Nevertheless, because of the uncer- 
tainties concerning the orientations of the dipoles of donor 
and acceptor, the degree of their orientational freedom, 
and the relationship of the distance between them and the 
distance between their points of attachment, because of 
the modest success of such estimates, and because crys- 
tallographic molecular models have become far more 
common, measurements of the transfer of energy by res- 
onance are used infrequently to estimate distances. They 
are, however, still widely used for other purposes, because 
they have the advantage of providing information about a 
protein while it is in solution. 

The transfer of energy by resonance is used to detect 
conformational changes in a protein. For example, the effi- 
ciency of the transfer of energy by resonance between 
Tryptophan 133 and a (5-sulfonaphthalen-1-yl)amino 
group attached through a tether to Cysteine 93 in dolichyl- 
phosphate ß-D-mannosyltransferase increased from 42% 
to 66% upon the binding of the substrate dolichyl phos- 
phate.’ From this observation, it was concluded that a 
conformational change occurs in the protein upon the 
binding of the substrate, and from the magnitude of the 
change in the efficiency of the transfer of energy, it was esti- 
mated that the distance between Tryptophan 133 and 
Cysteine 93 decreased about 0.3 nm during this confor- 
mational change. From similar observations, conforma- 
tional changes producing shifts in the apparent positions 
of donors and their acceptors of 0.3, 0.3, 0.5, and 1.2 nm 
have been observed for the binding of DNA to transcrip- 
tion factor AP-1,'* the exchange of Na for K* in the active 
site of Na*/K*-exchanging ATPase, '™ the exchange of Ca" 
cations for Mg” cations in troponin, '® and the binding of 
a bisubstrate analogue to adenylate kinase, respectively.’ 

For all of the same reasons, associating a change in 
the distance between two locations on a protein with 
the change in the transfer of energy by resonance is just 
as uncertain as estimating a distance between them. 
Upon the binding of the B(1—4) trimer of N-acetylglu- 
cosamine to lysozyme from chicken, the distance 
between a kynurenine at position 62 and Tryptophan 108 
appeared to decrease by 0.5 nm when calculated from 
the change in the efficiency of transfer.” It is unlikely, 
however, that such a large conformational change occurs 
on the binding of the oligosaccharide because no such 
differences (< 0.04 nm) in the distance between these two 
amino acids is observed between the crystallographic 
molecular models of unliganded lysozyme and lysozyme 
to which the ß(1—4) tetramer of N-acetylglucosamine is 
bound." It is possible that crystal packing constrains 
both the liganded and unliganded conformations to the 
same structure in the crystal even though their structures 
are so different in solution. It is more likely, however, that 
the relative orientations of the donor or the acceptor or 
both are changed upon the association of the ligand, not 
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the distance between them. In the case of the sliding 
clamp of bacteriophage T4 DNA-directed DNA poly- 
merase, however, it is thought that the ring of subunits 
composing the protein must split open so that a mole- 
cule of DNA can enter the hole in its center, and changes 
in the efficiency of the transfer of energy by resonance 
between donors and acceptors on different subunits 
equivalent to changes in distances of up to 1.5 nm are 
thought to reflect real changes in distance upon the 
opening of the ring and its subsequent intimate embrace 
of the DNA.’ 

The transfer of energy by resonance is also used to 
monitor the association between a molecule of protein 
modified with a donor and another molecule of protein 
modified with an acceptor. The catalytic a subunit of 
cyclic AMP-dependent protein kinase has been modified 
with a tethered fluorescein and the regulatory D subunit 
with a tethered rhodamine. During the heterologous 
association of the two subunits, the fluorescence of the 
fluorescein decreases by about 30% as the rhodamine, an 
acceptor of the energy of its excited state, is brought into 
its vicinity.'®” Such assays based on the decrease in the 
fluorescence of a donor produced by an acceptor upon 
formation ofa complex have been used to follow the asso- 
ciation of cytochrome c and cytochrome-c oxidase'” and 
the association of myosin and actin.’ 

Changes in the efficiency of the transfer of energy 
by resonance resulting from changes in the molar con- 
centrations of the participants can also be used to meas- 
ure the dissociation constant for a complex between two 
proteins or the complex between a protein and a nucleic 
acid. The dissociation constant between cytochrome c 
and cytochrome-c oxidase'” and the dissociation con- 
stant between Rho-GDP dissociation inhibitor and GTP- 
binding protein Cdc42'” have been determined by 
monitoring changes in the efficiency of the transfer of 
energy, as has the equimolar stoichiometry of the com- 
plex between the sliding clamp and the clamp holder of 
DNA-directed DNA polymerase from bacterio- 
phage 14.’ Both the dissociation constant and the 
kinetics of the association between DNA and transcrip- 
tion factor AP-1'**’™ have been monitored by changes in 
the transfer of energy by resonance. 

Dissociation constants are also measured by moni- 
toring changes in fluorescence that do not involve trans- 
fer of energy by resonance. For example, the 
enhancement of the fluorescence of poly(deoxy- 
1,N°-ethenoadenylic acid), a fluorescent analogue of 
poly(adenylic acid), upon the association of RecA protein 
from E. coli has been used to determine the dissociation 
constant and the kinetics of the association between the 
protein and the nucleic acid." 

The transfer of energy by resonance has also been 
used to monitor changes in the structure of DNA pro- 
duced by a protein. By modifying the immediately adja- 
cent 3’ end of one strand and 5’ end of the other strand of 
a molecule of double-stranded DNA with a donor and an 
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acceptor, respectively, the increase in fluorescence of the 
donor when the two strands are dissociated has been 
used to monitor the unwinding of DNA catalyzed by ATP- 
dependent DNA helicase Rep from E coli!” and the 
exchange of one strand in a duplex of DNA for another 
catalyzed by RecA protein from E coli.” By labeling 
short, double-stranded molecules of DNA with a donor at 
one end and an acceptor at the other, the decrease in the 
distance between donor and acceptor during the bend- 
ing of the DNA caused by the binding of high mobility 
group protein Z from Drosophila melanogaster could be 
monitored.’ 

Another manifestation of fluorescence that can be 
used to assess the proximity of two sites in a molecule of 
protein is the formation of an excited-state dimer or 
excimer. If upon its formation, the excited state of a suit- 
able chromophore, for example, a pyrenyl group, is 
immediately adjacent to another one of the same chro- 
mophore, for example, another pyrenyl group, the excited 
state will form a dimer with its unexcited twin. Such a 
dimeric excited state, because of the greater opportuni- 
ties for dissipation of the energy of the excited state radi- 
ationlessly, emits light of longer wavelength. For example, 
a pyrenyl group when excited at 344 nm emits light of 
wavelengths 378, 398, and 417 nm, but its excimer emits 
light of wavelength 470 nm. The observation of excimer 
fluorescence from a protein modified at two different 
amino acids with an appropriate chromophore is evi- 
dence that in the native structure of the protein those two 
amino acids are immediately adjacent to each other, even 
though they are distant from each other in its 
sequence. 9? 
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Problem 12-10: Hepatitis B virus produces severe 
inflammation of the liver. The mature, infectious virion is 
composed of three spherical, concentric layers. The 
outer layer is a continuous envelope of membrane the 
phospholipid bilayer of which came from the plasma 
membrane of the cell out of which the virion budded and 
the protein of which was encoded by viral DNA. The 
inner layer is a sphere of condensed, double-stranded 
viral DNA (3.2 kb) encoding the viral genome. The 
middle layer is a capsid enclosing the viral DNA. This 
viral capsid is an oligomeric protein composed of multi- 
ple copies of the same folded polypeptide, 183 amino 
acids in length. The amino acid sequence of this 
polypeptide is 


MDIDPYKEFGATVELLSFLPSDFFPSVRDLLDTAAALYRD 
ALES PEHCS PHHTALRQATLCWGDLMTLATWVGTNLEDPA 
SRDLVVSYVNINVGLKFROLLWFHISCLTFGRETVLEYLV 
SFGVWIRTPPAYRPPNAPILSTLPETTVVRRRGRSPRRRT 
PSPRRRRSOSPRRRRSOSRESOC 


The first 145 amino acids of the polypeptide fold to form 
the compact subunit that creates the viral capsid, and the 
last 40 amino acids interact with the viral DNA to provide 
counterions for the phosphodiesters of the DNA and 
assist in condensing and packaging it. 

A recombinant form of the gene encoding the viral 
capsid has been constructed for studies of its assembly. 
This recombinant gene, carried on the plasmid 
ptacHBc144, is under the control of the lacUV5 promoter 
and directs the expression in Escherichia coli ofa polypep- 
tide 154 amino acids long. The amino-terminal sequence 
of this polypeptide is TMITDSLEFH-, and the carboxy- 
terminal sequence is -IS. Between these two terminal 
sequences, which result from the cloning strategy, is the 
sequence of the polypeptide of the viral capsid from 
Isoleucine 3 to Proline 144. The expressed recombinant 
polypeptide lacks the carboxy-terminal portion of the 
subunit that interacts with the viral DNA. Nevertheless, it 
folds as it is being expressed in E. coli, and the resulting 
subunit then assembles, also within the bacterium, to 
form an oligomer similar to the viral capsid in the intact, 
infectious virion. This recombinant, empty, unenveloped 
viral capsid can be purified, and the purified protein is 
composed of only the one polypeptide, which is not post- 
translationally modified. This native, empty oligomeric 
protein is referred to as the HBe viral capsid.” 

The molar mass of the polypeptide of the HBe viral 
capsid at a net charge number of zero, calculated from 
the amino acid sequence of the expressed protein, is 
17,369.9 g mol’. 


(A) The actual molar mass ofa folded polypeptide dif- 
fers from its molar mass at zero net charge depend- 
ing on the conditions. Give reasons other than 
posttranslational modification for this difference. 


(B) Estimate the magnitude of the effect of these fac- 
tors (plus or minus how many grams mole”) on 
the molar mass of the subunit of the HBe viral 
capsid. What would be the appropriate choice of 
significant figures for expressing the molar mass 
of the subunit? 


The HBe viral capsid was submitted to sedimenta- 
tion equilibrium at 2800 rpm, 20°C, in 0.15M NaCl 
and 50 mM tris(N-hydroxymethyl)aminomethane hydro- 
chloride, pH 7.0 (p,oı = 1.00494 g cm”). 


(C) Why was the sodium chloride present? 


At sedimentation equilibrium, the absorbance at 
290 nm (A9) within the sample chamber showed the 
following dependence on the radial distance (r) from 


the center of rotation.” Reprinted with permission from 
ref 201. Copyright 1995 American Chemical Society. 
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(D) Calculate the molar mass ofthe HBe viral capsid. 
In this calculation, use the approximation of 
Equation 8-21. The partial specific volume of the 
protein, calculated from its amino acid composi- 
tion, is 0.743 cm? g". Note that 


dInc dln Aan AlN Aggy 
= + = 
dr? dr? dr? dr? 


prot din E290 


where Son is the extinction coefficient for the protein at 
290 nm. 


(E) On what measurements of the concentration 
of protein does this calculation of molar mass 
rely? 


A similar recombinant version of the viral capsid 
of hepatitis B virus has been embedded in 
amorphous ice, and electron micrographs of that pro- 
tein have been submitted to image reconstruction. The 
following are two views of that reconstruction.’ 
Reprinted with permission from ref 202. Copyright 1994 
Elsevier B.V. 


(F) What is the symmetry of the capsid of hepatitis B 
virus? How many folded polypeptides are in each 
protomer? 


(G) Exactly how many folded polypeptides are there 
in the HBe viral capsid? What is its exact molar 
mass? 


(H) The standard sedimentation coefficient at 20 °C 
in water, ane of the HBe viral capsid is 44 S. 
Calculate its frictional coefficient. 
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(D On the basis of this frictional coefficient, if the 
viral capsid were a smooth, unhydrated sphere, 
what would be its radius? 


The HBe viral capsid was examined by scanning 
transmission electron microscopy. The inset in the figure 
is a selected field from one of these micrographs con- 
taining three HBe viral capsids.” Reprinted with per- 
mission from ref 201. Copyright 1995 American Chemical 
Society. 


= 
N 


oO 
D 


Density (kDa nm-3) 


0 4 8 12 16 
Radius (nm) 


Each image obtained in this procedure is a projection of 
the three-dimensional molecule onto a two-dimensional 
plane. The image is in negative contrast so the protein 
appears as a light object on a dark background, and the 
degree of contrast at any point in the plane is directly 
proportional to the amount of protein contributing to 
that projected point. These images can be scanned, and 
the two-dimensional distribution of contrast can be con- 
verted into a radial distribution of density if it is assumed 
that the viral capsid is spherically symmetric. A graphical 
representation of this radial distribution of density in 
kilodaltons nanometer” is presented in the figure. 


O) Why does the density of protein decrease gradu- 
ally beyond a radius of 12nm rather than 
abruptly as it would if the protein were a smooth 
sphere? 


(K) Onthe basis of this distribution of protein density, 
what is the maximum value for the radius of the 
HBe viral capsid? 


(L) What would be the frictional coefficient for a 
spherical molecule of protein with this radius? 


(M) What reasons could explain why the actual fric- 
tional coefficient of the HBe viral capsid is larger 
than this maximum theoretical frictional coeffi- 
cient? 


The following is the circular dichroic spectrum of 
the HBe viral capsid and a list of the values of the molar 
ellipticity [degrees centimeter’ (decimole of peptide 
bond)"] at three selected wavelengths.” Reprinted 
with permission from ref 201. Copyright 1995 American 
Chemical Society. 
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The values for the molar ellipticities at these same three 
wavelengths for the three standard curves in Figure 
12-10, which represent pure œ helix, pure structure, 
and pure random meander, are 


(N) 


(O) 
(P) 


(Q) 


A [a [lg [Alam 
(nm) (deg cm? dmol™ x 10%) 
223 -2.75 —0.97 +0.31 
210 2.67 0.78 0.78 


198 +3.85 +3.85 —3.96 


Assume that the folded polypeptide of the HBe 
viral capsid contains only o helix, p structure, and 
random meander, and formulate a set of three 
simultaneous equations relating the three 
unknowns fy Ío and frm, the respective fractions 
of each type of secondary structure in the protein. 
Do not use as an equation the assumption that the 
sum of these three fractions is 1. 


Solve this set of equations for fy fg and frm- 


Why is it so surprising that the values of fẹ fo, and 
frm add up to 1 anyway? 


On the basis of these numerical values of fy, fọ, and 
fru Why is one required to conclude that the viral 
capsid of southern bean mosaic virus (Figure 
9-27) and the viral capsid of hepatitis B virus do 
not share a common ancestor even though their 
overall structures are similar? 
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Problem 12-11: The circular dichroic spectra of several 
proteins are shown to the left.'’’ The solid lines are the 
observed spectra; the dotted lines are theoretical fits to 
the data. Reprinted with permission from ref 117. 
Copyright 1980 Academic Press. 


(A) Using the letters in each panel to designate each 
protein, rank them in order from the one with the 
most œ helix and the least p structure to the one 
with the most ß structure and least « helix. 


(B) Consider the protein with the most œ helix. What 
would be the maximum percentage of œ helix it 
could have? 


(C) Consider the protein with the least œ helix. What 
would be the maximum percentage of p structure 
it could have? 


Problem 12-12: Suppose that Rọ = 1.7 nm for the trans- 
fer of energy by resonance between a donor and an 
acceptor on a protein and that the efficiency of the 
energy transfer between donor and acceptor is 0.79. 
When a ligand that binds to the protein is added, the effi- 
ciency of energy transfer decreases to 0.64. What is the 
apparent change in the distance between donor and 
acceptor that occurs upon binding of the ligand? 


Nuclear Magnetic Resonance?" 

Many atomic nuclei display rotational motion known as 
nuclear spin. Because nuclear spin is quantized, its 
angular velocities can assume only those magnitudes 
dictated by spin quantum numbers. Among many other 
atomic nuclei, those of 'H, °C, "N, "°F, and *!P have only 
two spin quantum numbers, +% and -%. These dictate 
two specific angular velocities of the same magnitude 
but of opposite polarity. These two angular velocities are 
the two spin states of these nuclei. As any one of these 
nuclei is a charged particle by virtue of its protons, either 
of these angular velocities creates a magnetic field of the 
respective polarity aligned with the axis of the nuclear 
spin. When such a nucleus is placed in an external, 
homogeneous magnetic field of a given polarity, its axis 
tends to align with the direction of the applied field, and 
its spin states, because they are of opposite polarity to 
each other, become different in energy. This difference 
in energy, AE, is directly proportional to the magnetic 
flux density B; (tesla) at the location of nucleus i; and as 
in optical spectroscopy, the difference in energy deter- 
mines the frequency v; (hertz) of electromagnetic energy 
that is absorbed by nucleus i: 

yıhB, 


(12-53) 


where h is Planck’s constant and % is the magnetogyric 
ratio (radians tesla’ second) for nucleus i, which is 
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determined only by the type of nucleus, 'H, "°C, N, "F, 
or °!P, that it is (Table 12-2). The frequency v; at which 
nucleusi absorbs in an applied external field is its 
Larmor frequency. 

At readily accessible magnetic flux densities 
(<25 T), the difference in energy between the two spin 
states of one of these nuclei is less than 0.5 J mol, which 
is the energy contained in a photon of wavelength greater 
than 3 x 10°nm and frequency less than or equal to 
1000 MHz. This is in the radiofrequency range of elec- 
tromagnetic energy. 

A solution of molecules contains discrete popula- 
tions of atomic nuclei in which each and every nucleus 
is chemically identical. For example, if the naturally pres- 
ent deuterium is ignored, a solution of p-xylene 


H H 
HC CH 
H H 


12-8 


uniformly and completely labeled with carbon would 
contain one discrete population of 'hydrogen nuclei 
composed from the four hydrogens attached to the 
phenyl ring in each molecule of p-xylene, a discrete pop- 
ulation of 'hydrogen nuclei composed from the six 
hydrogens attached to the methyl groups in each mole- 
cule of p-xylene, a discrete population of carbon nuclei 
composed from the carbons of the methyl groups, a dis- 
crete population of carbon nuclei composed from the 
para carbons of the ring, and a discrete population of 
carbon nuclei composed from the meta carbons of the 
ring. The nuclear spin of each nucleus in a given popula- 
tion of nuclei can be represented as a vector of unit 
length parallel to the axis of its spin. The net magnetiza- 
tion of a given population of nuclei is the vector sum of 
its individual nuclear spins. In an applied magnetic field, 
although the individual nuclear spins show only a ten- 
dency to align with the field, the net magnetization of 
each population of nuclei is aligned exactly with the 
direction of the field. In an applied magnetic field, each 
of these discrete populations of nuclei in the solution has 
a corresponding Larmor frequency for the nuclear mag- 
netic resonance absorption associated with it. 

When a population of chemically identical funda- 
mental particles, such as electrons or atomic nuclei, is 
exposed to electromagnetic radiation of a wavelength 
equivalent in energy to the difference between two 
energy levels accessible to the particles, the electromag- 
netic radiation catalyzes the movement of the members 
of that population of particles between these two energy 
levels. The reason is that the electromagnetic radiation is 
in resonance with the transition between the two energy 
levels. Photons with this energy will be absorbed during 
this process only if, at the time of irradiation, the popu- 
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lation of particles occupying the lower energy level is 
greater than the population occupying the higher energy 
level. The absorption of photons, however, necessarily 
increases the population in the higher energy level at the 
expense of the population in the lower energy level. 
When the populations in the two resonating energy 
levels become equal to each other, absorption can no 
longer occur, and a state of saturation is reached.* 

In the electronic transitions and vibrational transi- 
tions of electrons in atomic and molecular orbitals 
(Figure 12-5), the energy levels are sufficiently different 
that almost all unexcited electrons are in the state of 
lower energy, and relaxation back to the state of lower 
energy is sufficiently rapid (>10 ns™® that absorption of 
a particular wavelength of light by a given population of 
chemically identical electrons rarely displays saturation. 
In nuclear magnetic resonance, however, the energy dif- 
ference between the two spin states that can be achieved 
with the available magnetic flux densities is so small 
(<0.5 J mol) that the equilibrium constant Kp between 
the two spin states for a population of identical nuclei at 
normal temperatures (300 K) is very close to 1 (1 < Ksp < 
1.0002). This means that the difference in the concentra- 
tions of the nuclei in the two spin states at equilibrium 
will be less than 200 ppm. The difference between the 
populations in the two energy levels set in resonance is 
small enough and the rate of relaxation of a nucleus in 
the level of higher energy back to the level of lower 
energy is slow enough (1 s™ that saturation occurs 
readily. This causes the amplitude of the observed 
absorption of the electromagnetic energy in nuclear 
magnetic resonance spectroscopy to be sensitive to the 
rate of relaxation of the populations of individual nuclei 
from the state of saturation to the state of equilibrium. 
The faster the population relaxes, the more energy it can 
absorb. For this relaxation to occur, the excess energy 
that has been absorbed has to be dissipated. 

The chemical shift of the nuclear magnetic reso- 
nance absorption of a population of nuclei is a measure 
of the frequency of the electromagnetic energy at which 
the absorption appears in the spectrum. The chemical 
shift of the absorption of a particular population of 
nuclei is determined by the chemical environment of the 
chemically identical nuclei that compose the population. 
The electrons surrounding a given nucleus circulate in 
response to the applied magnetic field as a current would 
in a copper coil. This current decreases the local mag- 
netic flux density B; experienced by the nucleus and 
establishes its characteristic Larmor frequency (Equation 
12-53). The chemical shift 6; of a nuclear magnetic reso- 
nance absorption from a population of chemically iden- 


* Because the absorption of electromagnetic energy is the conse- 
quence of resonance and because the resonance is so much closer 
to equilibrium in nuclear magnetic resonance spectroscopy than in 
optical spectroscopy, nuclear magnetic resonance spectroscopists 
often use the word “resonance” in place of the word “absorption”. 


tical nucleii is the normalized difference between its 
Larmor frequency v;, the frequency at which it absorbs, 
and the frequency vag at which the population of a stan- 
dard nucleus absorbs: 


v = V; 
std i 
Vstd Beta 


(12-54) 


The units used for chemical shift are parts per million 
(ppm) relative to the absorption of the standard nucleus 
in a particular reference compound because the local dif- 
ferences in magnetic flux density are never greater than 
about 0.0002 (200 ppm) of the applied field. Chemical 
shift cannot be expressed in absolute units of energy 
because the energy difference between a particular 
absorption and that of the standard varies with the mag- 
nitude of the applied field (Equation 12-53). The magni- 
tude of the chemical shift provides chemical information 
about the disposition of the electrons in the environment 
surrounding the nucleus, in other words, the molecular 
structure in its vicinity. 

Nuclear magnetic resonance spectroscopy meas- 
ures the same phenomenon as optical spectroscopy. In 
an external magnetic field every nucleus of spin % has 
two energy levels. Depending on the flux density of 
the applied magnetic field and the type of nucleus, 
electromagnetic energy of a discrete wavelength (fre- 
quency) somewhere between 2 x 10'° nm (15 MHz) and 
3 x 10° nm (1000 MHz) will be absorbed by a particular 
population of identical nuclei in the process of exciting 
nuclei in the population of the spin state with lower 
energy to the spin state with higher energy. In theory, 
these absorptions of energy by each discrete population 
could be recorded as a function of wavelength to obtain 
a spectrum, as is done with an optical absorption spec- 
trum. Continuous wave (CW) nuclear magnetic spec- 
trometers approximate this ideal. They measure the 
absorption of energy of a fixed frequency as the flux den- 
sity of the applied magnetic field is varied slowly and 
continuously, and they record the variation in the inten- 
sity of the radiofrequency signal as it is absorbed by the 
sample. This measurement produces a scan of absorp- 
tion as a function of the flux density of the magnetic field. 
The direct proportionality between magnetic flux density 
and frequency (Equation 12-53) permits the spectrum of 
absorption to be presented as a function of frequency. 
Maxima of absorption appear in the spectrum at the 
Larmor frequencies of the different populations of nuclei 
in the sample. Nuclei of the different elements differ dra- 
matically in their ability to absorb radiofrequency energy 
(Table 12-2) and hence the intensities of their maxima. 

Almost all instruments used today, however, are 
Fourier transform (FT) nuclear magnetic resonance 
spectrometers. In such a spectrometer there is a radio- 
transmitter that generates radiowaves of a set frequency, 
for example 600 MHz, referred to as the carrier fre- 


Table 12-2: Nuclear Properties?” 


Quantum Number % 


of Nuclei with Spin 


nucleus magnetogyric frequency of relative 
ratio maximum amplitude of 
(trad T! sx 108) absorption at1T absorption at 
(MHz) constant field 
Ip 2.675 42.577 1.000 
HG 0.673 10.705 0.016 
BN -0.271 4.315 0.001 
IR 2.517 40.055 0.834 
ŝip 1.083 17.231 0.066 


quency v. The radiowaves generated by the radiotrans- 
mitter are propagated in a direction x perpendicular to 
the direction z of the fixed magnetic field so that the 
magnetic fields of the radiowaves oscillate in the dimen- 
sion y at the carrier frequency. The net magnetization of 
a given population of nuclei is precisely aligned with the 
z axis. 

The magnetic field of the instrument is adjusted 
(Equation 12-53) so that the span of the Larmor frequen- 
cies of the various populations of nuclei to be examined, 
for example, all of the populations of ‘hydrogen in the 
sample, is centered on the carrier frequency. The sample 
is excited with a strong (50 W) pulse of radiowaves at the 
carrier frequency. The pulse, however, is so short 
(5-50 us) that a set of radiowaves is produced of almost 
equal intensity within a continuous range of frequencies 
encompassing the Larmor frequencies of the various 
populations of nuclei. If the proper length of the pulse is 
chosen by trial and error, the net magnetization of each 
of the populations of nuclei will be diverted from its 
alignment with the applied field in the z direction to an 
alignment with the oscillating magnetic field of the pulse 
in the y direction. A pulse of the proper duration is a 90° 
pulse, referring to the fact that it has moved all of the net 
magnetizations by 90°. The 90° pulse is strong enough to 
saturate by resonance each population of nuclei as well 
as divert the alignment of its net magnetization. 

At the end of the pulse, when only the applied field 
in the z direction remains, the net magnetization of each 
population of nuclei begins to precess in the x,y plane 
around the z axis at its Larmor frequency. As they lose 
their saturation, each of the various populations of these 
precessing nuclei emits a radio signal at its Larmor fre- 
quency, just as a population of excited electrons emits 
fluorescent light of a frequency equivalent to the differ- 
ence in energy between the excited state and the ground 
state upon its relaxation. Because this emission decays as 
the saturation is lost both through the emission itself and 
through normal relaxation, it is referred to as a free 
induction decay. The free induction decay, which is reg- 
istered from the conclusion of the 90° pulse until it has 
relaxed to nothing, is the only direct measurement made 
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by the spectrometer and must contain all of the data nec- 
essary to produce a spectrum. The free induction decay 
is registered by a radio receiver that is tuned to the car- 
rier signal so that the output produced by the receiver is 
the modulations of the frequency of the carrier signal 
that each population of nuclei produces. Consequently, 
the signal that is registered from each population by the 
receiver is at the frequency of the difference between its 
Larmor frequency and the carrier frequency, which hap- 
pens to be proportional to its chemical shift from the car- 
rier frequency (Equation 12-54). After it has been 
submitted to a suitable linear combination of its real and 
imaginary terms, the Fourier transform of the total free 
induction decay from the receiver, which is the sum of all 
the free induction decays from all of the populations of 
nuclei, has positive maxima at the chemical shifts of each 
of the populations of nuclei relative to the carrier fre- 
quency. This Fourier transform reproduces a continuous 
wave spectrum of absorption from the same sample. 

The absorption of a population of chemically iden- 
tical nuclei is usually split into a series of peaks produc- 
ing a symmetrical pattern around its mean resonant 
frequency. This splitting is due to spin-spin coupling, 
also referred to as J coupling. It arises from the fact that 
any adjacent spinning nucleus A, covalently linked to the 
nucleus X being observed, acts as a small magnet that 
induces the electrons between it and the nucleus X to cir- 
culate. This induced current alters the local magnetic 
field B; at the nucleus X. Within the whole population of 
molecules, the various nuclei A assume both of their spin 
states randomly and almost equivalently, but each par- 
ticular nucleus X in the population can be spin-coupled 
to a particular nucleus A in only one of those two spin 
states. This divides the population of nuclei X into two 
different groups, each group having each of its corre- 
sponding nuclei A in only one of its two available spin 
states. 

The spin-spin coupling constant, J,x, is the magni- 
tude of the magnetic effect of nucleus A on nucleus X. 
Because spin-spin coupling is a function only of the 
intrinsic magnetic fields of the neighboring nuclei, cou- 
pling constants are not a function of the magnitude of 
the applied field, and their values (which are invariant 
differences in energy) are expressed quantitatively as the 
number of hertz by which the magnetic field of nucleus A 
splits the frequency of the absorption of nucleus X into 
two peaks of different frequency. Because spin-spin cou- 
pling is relayed by the electrons in the covalent bonds 
connecting the nuclei, it decreases in magnitude with the 
number of bonds separating them. For example, the 
values of J for the spin-spin coupling of 'hydrogen nuclei 
through two bonds can be as large as 15 Hz, and for 
coupling of 'hydrogen nuclei through three bonds, as 
large as 12 Hz, but for coupling of ‘hydrogen nuclei 
through four bonds, the values are less than 1 Hz. 

In addition to being determined by the number of 
bonds separating the two coupled nuclei, the value of the 
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spin-spin coupling constant J also depends on the angles 
at which two nuclei are held with respect to each other. 
For example, for two ‘hydrogen nuclei coupled through 
two bonds, the range of values is 10-15 Hz if the bond 
angle is close to the sp’ tetrahedral angle of 109°, but the 
value of the coupling constant falls to 2-3 Hz at the sp” 
angle of 120°. In three-bond coupling, the value of the 
spin-spin coupling constant depends on the dihedral 
angle between the two nuclei along the bond connecting 
the two neighboring atoms to which they are attached: 


x 


} 


A 
12-9 


The coupling constant Jax is at its maximum when the 
dihedral angle is 0° or 180° and at its minimum when 
the dihedral angle is 90° or 270°. At these latter angles, 
the spin-spin coupling constant can be almost zero. The 
maxima at 0° and 180° for such coupling between 
the nuclei of two ‘hydrogens is about 10 Hz, and between 
the nuclei of a ‘hydrogen and a “carbon, about 8 Hz.” 

Two populations of nuclei, X and A, can also be cou- 
pled by a nuclear Overhauser effect. A nuclear 
Overhauser effect of the population of nuclei A on the 
population of nuclei X is any change in the net spin state 
of the population of nuclei X produced by a change in the 
net spin state of the population of nuclei A. For example, 
a nuclear Overhauser effect can be the consequence of 
either an alteration in the relaxation rate between the 
two spin states accessible to the members of the popula- 
tion of nuclei X or the consequence of an alteration in the 
levels of occupation of the two spin states within the 
population of nuclei X caused by a change in the net spin 
state of the population of nuclei A. A change in the net 
spin state of the population of one nucleus produced by 
a change in the net spin state of the population of 
another nucleus results from dipolar interactions 
between the two respective nuclei. A dipolar interaction 
is a function of, among other things, the distance 
between the two nuclei, r, and its magnitude is propor- 
tional to r°. The change in the spin state of nucleus X by 
nucleus A caused by a dipolar interaction is a second 
order perturbation, and hence it is proportional or", 
The transfer of energy between two transition dipoles by 
resonance is also a dipolar interaction and has the same 
dependence on distance (Equation 12-48). Because of 
the inverse dependence on the sixth power of the dis- 
tance, nuclear Overhauser effects are significant only if 
the nucleus X and the nucleus A in a particular molecule 
are close to each other. 

Because a nuclear Overhauser effect is not trans- 
mitted by changes in the static, local magnetic field of 
nucleus X brought about by a change in the spin state of 


nucleus A, there is no requirement that the two nuclei be 
associated by covalent bonds as there is with spin-spin 
coupling. Nuclear Overhauser effects can indicate that 
the two nuclei involved are adjacent to each other in the 
tertiary structure of a protein even though they may be 
distant from each other in its primary structure. As with 
the transfer of energy by resonance, however, there are 
factors other than the distance between the nuclei asso- 
ciated with the dipolar interactions producing nuclear 
Overhauser effects, causing the intensity of a nuclear 
Overhauser effect not to be directly proportional to the 
inverse of the sixth power of this distance "7 

Because nuclear Overhauser effects are manifested 
only as changes in the net spin state of the population of 
one nucleus under the influence of a change in the net 
spin state of the population of the other, no change 
occurs in the chemical shift of either nucleus involved in 
the nuclear Overhauser coupling as there was with 
spin-spin coupling. Rather, a nuclear Overhauser effect 
is registered as a change in the intensity of the absorp- 
tion for nucleus X. For example, if the net rate of relax- 
ation of the population of nucleus X is increased by the 
change that has occurred in the net spin state of the pop- 
ulation of nuclei A, then the amplitude of the absorption 
for nucleus X will increase. If the occupation of the two 
spin states available to the members of the population of 
nuclei X is caused to become more equal by the change 
that has occurred in the net spin state of the population 
of nuclei A, then the amplitude of the absorption for 
nucleus X will decrease. 

In large, relatively rigid macromolecules such as 
proteins, nuclear Overhauser effects between ‘hydrogen 
nuclei are usually a consequence of the transfer of satu- 
ration.” Transfer of saturation is the transfer of a por- 
tion of the saturation of one population of nuclei, 
nuclei A, to another population of nuclei, nuclei X. For 
transfer of saturation to occur, each of the individual 
nucleiX must be adjacent in space to a nucleus A. 
Transfer of saturation results from the summation of a 
large number of individual exchanges of spin state 
between a nucleus A and a nucleus X. The two adjacent 
nuclei simultaneously and reciprocally exchange their 
spin states in opposite directions with essentially zero 
change in the total energy of the two exchanging nuclei. 
During each exchange, the spin state of that particular 
nucleus X becomes what was the spin state of the partic- 
ular nucleus A and vice versa. As a result of a large 
number of such exchanges of spin state at the atomic 
level, a portion of the saturation of the population of 
nuclei A is transferred to the population of nuclei X. The 
driving force is the resulting increase in entropy. A 
sequence of such transfers of saturation among a 
number of populations of adjacent nuclei can cause the 
saturation of one population of nuclei to spread outward 
over populations of nearby nuclei. 

Spin diffusion is this outward spread of saturation 
from the saturated population of nuclei A. For spin diffu- 


sion to occur, the populations of the nuclei in the vicin- 
ity of the nuclei A must be unsaturated so they are able to 
assume the saturation transferred from the population of 
nuclei A. It is the saturation of only the population of 
nuclei A that permits the diffusive force to be observed, 
just as the creation of a gradient of concentration permits 
the diffusive force to be observed. 

Spin diffusion by transfer of saturation can be 
observed in an experiment analogous to the transfer of 
energy by resonance. The population of nuclei A is irra- 
diated at the radio frequency with which its spins res- 
onate and with sufficient amplitude to saturate its 
absorption, which equalizes the number of nuclei in its 
two spin states. The stimulating radiation is then turned 
off. The population of nuclei A will slowly relax back to its 
equilibrium distribution by losing the excess energy it 
has gained. One of the ways the population of nuclei A 
may relax is by transferring saturation to the population 
of nuclei X, if within the molecule nucleus X is close to 
nucleus A. If the absorption of the population of nuclei X 
is measured after a time, tm, sufficient for some of the sat- 
uration in the population of nuclei A to be transferred to 
the population of the nuclei X, the absorption of the pop- 
ulation of nuclei X will have decreased relative to its 
absorption in the absence of transfer of saturation 
because the population of nuclei X will have been moved 
closer to saturation. 

An example of a nuclear Overhauser effect that was 
the consequence of spin diffusion was observed during a 
spectroscopic study of cytochrome c from Katsuwonas 
pelamis.” The heme in cytochrome c, as it is a large aro- 
matic ring (2-4), produces a substantial toroidal mag- 
netic field when its z electrons circulate as a ring current 
in the presence of the applied magnetic field. The 
ô, methyl group of Leucine 68 (Figure 7-9C) resides adja- 
cent to the heme in the region of this local field that is 
opposed to the applied field, and the chemical shift 
(-2.7 ppm) of the absorption of its three equivalent 
"hydrogens is even less than that of the reference absorp- 
tion. This substantial displacement isolates this peak of 
absorption from the absorptions of the rest of the methyl 
‘hydrogens in the protein. When the population of the 6, 
methyl ‘hydrogens on Leucine 68 was saturated by preir- 
radiation at the frequency of its chemical shift, the 
absorptions of four other populations of ‘hydrogens 
were found to decrease. These were assigned to 'hydro- 
gens neighboring the 6, methyl group of Leucine 68 in 
the crystallographic molecular model of the protein. 

The initial nuclear magnetic resonance spectra of 
proteins were of ‘hydrogen nuclei in molecules dissolved 
in [PH]H,O. They were one-dimensional spectra of 
absorption as a function of chemical shift. Even a small 
protein of 100 amino acids has more than 700 hydrogens 
in it, most of them unique and most of their absorptions 
split by spin-spin coupling. It is not surprising, therefore, 
that such spectra contain, by and large, several broad, 
unresolved absorptions, each resulting from the overlap 
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of hundreds of individual absorptions.” The ranges in 
which these overlapping absorptions occur can be 
assigned to particular classes of nuclei: those of methyl 
‘hydrogens on leucines, isoleucines, valines, alanines, 
and threonines (6=0.9-1.5 ppm); methylene 'hydrogens 
(6 = 1.5-3.5 ppm); @'hydrogens on each amino acid 
(6 = 3.5-5.5 ppm); the ‘hydrogens on the peripheries of 
the aromatic rings of tryptophans, phenylalanines, his- 
tidines, and tyrosines (ô = 6.4-7.4 ppm); and the unex- 
changed amido ‘hydrogens of the peptide bonds and 
glutamines and asparagines (ô= 7.0-9.0 ppm). 

The central difficulty in nuclear magnetic reso- 
nance spectroscopy of even a small protein is that in a 
one-dimensional spectrum of absorption as a function of 
chemical shift, regardless of the nucleus chosen, the 
peaks of absorption from the individual nuclei overlap 
and cannot be distinguished from each other, let alone 
assigned. It has been possible, however, to dissect the 
nuclear-magnetic resonance spectra of small molecules 
of protein”! into their individual components by 
using the two-dimensional spectroscopy that has been 
developed by Ernst and his colleagues”''” and the 
three-dimensional spectroscopy that has been devel- 
oped by Bax and his colleagues (Table 12-3). 

All of the techniques of multidimensional nuclear 
magnetic resonance spectroscopy rely upon the tech- 
nique of frequency labeling, which is a direct elabora- 
tion of Fourier transform nuclear magnetic 
spectroscopy. To label the absorption of a population of 
nuclei i with its Larmor frequency, two successive 90° 
pulses are used. Following the first 90° pulse applied in 
the x direction, the net magnetization of the population 
of nuclei i is aligned with the y axis and then begins to 
precess in the xy plane around the z axis at its Larmor fre- 
quency. After a period of time t, a second 90° pulse in 
the x direction is applied. The second pulse is in phase at 
the carrier frequency vo with the first pulse to insure that 


Vol, =n (12-55) 


where n is an integer. In this way, the carrier frequency 
acts as an internal clock. This second 90° pulse diverts 
only the y component of the precessing net magnetiza- 
tion at that instant into the zdirection but leaves the 
xcomponent at that instant in the xyplane. 
Consequently, the amplitude of the remaining net mag- 
netization in the xy plane after the second 90° pulse is 
equal to M; sin 27 viti where M, is the amplitude of the net 
magnetization from the population of nuclei i before the 
second pulse and v; is its Larmor frequency. Because 


M,sin (2rv;t,) = M;sin[2x(v; = n)] = 


M;sin|[2x(v; - v) t] 
(12-56) 
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Table 12-3: Couplings Giving Rise to a Peak in a Three-Dimensional or Four-Dimensional Nuclear Magnetic Resonance 
Spectrum 
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“Boxes enclose the three or four coupled atoms that both produce the peak or peaks and determine the three or four chemical shifts, one for each of the three or four atoms, 

in the three or four respective dimensions. "The three- dimensional peaks from the a “carbon and the ß! “carbon are separate peaks on the same field, each coupled respec- 

tively to the same amido ` “nitroge n and amido "hydrogen or to the same combination of acyl “carbon and o ‘hydrogen, respectively, in the other two dimensions. “The 
three dimensions are “carbon, carbon, and ‘hydrogen. The ‘hydrogens coupled to one or the, other of the two “carbons appear on the three-dimensional field at the 
chemical shifts of the other "carbon and their own "carbon. “The coupling | between the amido ‘hydrogen and the o ‘hydrogen is the usual strong three- bond J coupling 
between adjacent ‘hydrogens. This coupling ¢ can be relayed through the o “carbon or can be enhanced by using HOHAHA. “The coupling between the o carbon and 
amido "nitrogen is relayed through the acyl "carbon. The three- dimensional peaks from the o ‘hydrogen and the £ ‘hydrogens are separate peaks each coupled respec- 

tively to the same amido ‘nitrogen and amido ‘hydrogen through the acyl “carbon. 


after the second 90° pulse, the amplitude of the net 
magnetization of the population of nuclei i precessing at 
its Larmor frequency in the xy plane has become a func- 
tion of the length of the interval p between the pulses. 
Furthermore, as t, is varied, this amplitude will vary 
harmonically with a frequency v; - Vo, which is directly 
proportional to the chemical shift (Equation 12-54) of 
the population of nucleii if the applied magnetic flux 
density Bapp is such that the carrier frequency vo is equal 
to the frequency at which the standard nuclei absorb, 
Vstd- 

Immediately following the second 90° pulse, the 
free induction decay of the excited sample is gathered in 
the usual way. The Fourier transform of the output of the 
radio receiver produces a nuclear magnetic spectrum. 
The amplitude of each peak in the spectrum, however, 
because it was derived only from the net magnetization 
remaining in the xy plane, has become a harmonic func- 
tion of tı with a frequency v;- vo. If spectra are gathered 
at systematically increasing values of t,, the amplitude of 
each peak will vary with a frequency proportional to its 
chemical shift (Figure 12-15). Each peak has become 
labeled with its own Larmor frequency, and this label- 
ing is manifested in the amplitude modulation of its 
peak in the spectrum with a frequency equal to the dif- 
ference between its Larmor frequency and the carrier 
frequency. In a spectrometer with a carrier signal of 
600 MHz, the values of v;- vp for ‘hydrogens are less than 
6000 Hz, and in a spectrometer with a carrier signal of 
150 MHz, the values of v; - vp for carbon are less than 
15,000 Hz, so the systematically increasing lengths of the 


Figure 12-15: Amplitude modulation of the absorption of a 
nucleus produced during correlated spectroscopy.” The absorp- 
tion of a particular population of identical nuclei produces a peak 
in the spectrum of absorption as a function of the chemical shift 6, 
after f(t,, t) has been submitted to Fourier transformation only in 
the second dimension. Each trace is this spectrum of absorption as 
a function of the chemical shift 6, at a different tı. The amplitude of 
the peak of absorbance records the harmonic precession of the 
nuclear spin relative to the carrier frequency during the interval t. 
Reprinted with permission from ref 203. Copyright 1995 John Wiley 
and Sons Ltd. 
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intervals t must be in the range of tens of microseconds 
to milliseconds to produce reliable modulations of the 
amplitudes. 

Asimple two-dimensional spectrum is an extension 
of this procedure of frequency labeling. A series of free 
induction decays from the sample are gathered at sys- 
tematically increasing intervals t. The time dimension of 
each free induction decay is designated as t,. The com- 
plete set of this series of free induction decays defines a 
function that is two-dimensional in time, f(t),t). The 
information in the first dimension of this function is 
encoded in the modulations of the amplitudes (AM) of 
the signals from the individual populations of nuclei, and 
the information in the second dimension is encoded in 
the modulations of the frequency (FM) of the free induc- 
tion decays. A two-dimensional Fourier transform of this 
function extracts the frequencies of these modulations in 
the two dimensions. The two-dimensional Fourier trans- 
form of the function f(t, tf) is a two-dimensional function 
in frequency, which when divided by the carrier fre- 
quency (Equation 12-54) is a two-dimensional function 
in chemical shift, f(6,, 62). 

If none of the populations of nuclei in the sample is 
spin-spin-coupled to any other, the amplitude of the 
signal from each population of nuclei is modulated only 
by its own Larmor frequency, and fl61,6) has peaks only 
when ô = ô. In such a case, the diagonal of the two- 
dimensional spectrum replicates the one-dimensional 
nuclear magnetic resonance spectrum, and nothing has 
been gained. If, however, one population of nuclei is 
spin-spin-coupled to another population of nuclei, the 
modulations of the amplitudes of their precessions are 
transferred between themselves during the second 90° 
pulse, and each of their precessions becomes labeled not 
only with its own Larmor frequency but also with the 
Larmor frequency of the other population of nuclei to 
which it is spin-spin-coupled. 

The Fourier transform picks out these coupled fre- 
quencies, and on the two-dimensional field, in addition 
to the one-dimensional spectrum along the diagonal, 
there are off-diagonal cross-peaks. Each of these cross 
peaks is located on the field at a chemical shift 6, of one 
population of nuclei and a chemical shift ô, of another 
population of nuclei to which the first population is 
spin-spin-coupled. Because spin-spin coupling is fully 
reciprocal, these cross-peaks are distributed symmetri- 
cally about the diagonal of the two-dimensional field. 
Correlated spectroscopy (COSY) is the technique that has 
just been described. A two-dimensional correlated spec- 
trum is a two-dimensional spectrum in which the off- 
diagonal cross-peaks arise from spin-spin couplings 
between different populations of nuclei and identify 
populations of spin-spin-coupled nuclei by their chemi- 
cal shifts. It is these off-diagonal cross-peaks that pull out 
individual absorptions from the one-dimensional spec- 
trum, spread them into two dimensions, and permit 
them to be observed individually. 
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A two-dimensional correlated spectrum (Figure 
12-16)” is a presentation of absorption as a function of 
two values of chemical shift (in parts per million), 6, and 
ô». Each off-diagonal cross-peak in the spectrum has the 
same value of the chemical shift 6, as a peak buried in the 
one-dimensional spectrum and the same value of the 
chemical shift 6, as another peak buried elsewhere in the 
one-dimensional spectrum but connected to the first by 
spin-spin coupling. The result is that two individual 
absorptions unresolved in the one-dimensional spec- 
trum are simultaneously drawn out of it and placed in 
isolation from all of the other absorptions otherwise 
overlapping them. This provides the resolution. The 
information provided by an off-diagonal cross-peak is 
that the two nuclei responsible for these two now iso- 
lated absorptions are connected through covalent bonds 
that mediate spin-spin coupling. 

The off-diagonal region displayed in Figure 12-16 
has a range for ô (6.5-10.6 ppm) that includes the chem- 
ical shifts for the amido ‘hydrogens of the polypeptide 
backbone and a range for 6, (1.7-6 ppm) that includes 
the chemical shifts of the hydrogens on the acarbons 
of the amino acids in the protein. The diagonal, one- 


Thydrogen chemical shift (ppm) 


dimensional spectrum is just beyond the lower right 
hand corner of the figure. Each cross-peak within the 
panel arises from the spin-spin coupling between the 
amido ‘hydrogen of one of the amino acids in the protein 
and its own o ‘hydrogen. Each cross-peak has pulled the 
absorption of each amido ‘hydrogen and the absorption 
of its adjacent a ‘hydrogen out of the unresolved one- 
dimensional spectrum so that they can be individually 
observed. Each cross-peak also assigns numerical values 
for the individual chemical shifts (6, and ô») of the two 
nuclei of each of these pairs of spin-spin-coupled 
‘hydrogens and states that the two ‘hydrogens with these 
two chemical shifts are connected to each other by three 
covalent bonds. This region of a two-dimensional 
(‘H-'H) correlated spectrum is a fingerprint for the pro- 
tein because almost every amino acid in its sequence is 
represented by a single cross-peak,* and the distribution 
of the cross-peaks on the field is unique to that protein. 


* Glycines, because they have two diastereotopic a'hydrogens, 
usually produce two cross-peaks of the same chemical shift in the 
amido "hydrogen dimension, and prolines, because they have no 
a!hydrogens, produce none. 


Figure 12-16: Two-dimensional ("'H-'H) corre- 
lated nuclear magnetic resonance spectrum ofa 
20 mM solution of basic pancreatic trypsin 
inhibitor (n,a =58) in 'H,O at pH 4.6 and 68 °C.” 
The spectrum is presented on a two-dimen- 
sional field with axes of the two respective 
chemical shifts. Each cross-peak is represented 
topographically as mountains are represented 
topographically on a map. Each cross-peak is a 
set of closed curves within closed curves. As ina 
map of electron density (Figure 4-12), each 
curve connects points of equal amplitude in the 
Fourier transform Të, ô); the more closed 
curves, the greater the amplitude of the peak. 
The spectrum has three dimensions, the two 
chemical shifts 6, and ö,, in the x and y dimen- 
sions in the plane of the page, and the amplitude 
of the Fourier transform, in the z dimension 
normal to the page and represented in the con- 
tours. The region of the spectrum presented 
contains the cross- peaks created by spin-spin 
couplings between the ‘hydrogen on each o car- 
bon (vertical axis) and the ‘hydrogen on the 
respective, immediately adjacent amido nitro- 
gen (horizontal axis). The spectrum is a finger- 
print for the protein, and each cross-peak 
assigns the respective chemical shifts of the two 
coupled nuclei. Every peak in this region of the 
spectrum has been assigned to one of the amino 
acids in the sequence of the protein. All of the 
amino acids in the sequence are represented, 
with the exception of the four prolines (which 
have no a hydrogens), Glycine 37, and Arginine 
1. Lysine 46, because the chemical shift of its 
absorption coincides with that of [H]H,O and is 
suppressed along with that of the [H]H;0, is 
also missing from the spectrum. Reprinted with 
permission from ref 210. Copyright 1982 
Academic Press. 
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Many improvements have been made to the origi- 
nal correlated spectrum. There are many sophisticated 
and intricate elaborations of the sequence of the pulses 
of oriented radiowaves that narrow the cross-peaks, 
eliminate background, and enhance dramatically the sig- 
nals from weakly absorbing nuclei such as carbon and 
'Snitrogen. Each of these elaborations is identified by its 
own acronym (Table 12-4). Because the nucleus of 
‘carbon has no magnetic moment and the nucleus of 
“nitrogen has a spin quantum number of 1, the proteins 
examined are now almost always modified so that all of 
their nitrogens are nitrogen and all of their carbons are 
‘carbon by expressing them in bacteria grown on 
[PN]INH,* as their sole source of nitrogen and on a 
["C]nutrient as their sole source of carbon. In this way, 
cross-peaks from these enriched proteins produced by 
heteronuclear spin-spin coupling can be observed. For 
example, the heteronuclear spin-spin coupling between 
an amido ‘nitrogen and its own amido ‘hydrogen, 
enhanced by an HSQC pulse sequence (Figure 12-17),”" 
provides an alternative fingerprint of the protein in 
which every amino acid is also represented. Two-dimen- 
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sional correlated spectroscopy has been expanded to 
three dimensions (Table 12-3). The cross-peaks in these 
spectra (Figure 12-18)?" are produced by spin-spin cou- 
pling among three nuclei, usually of two or three differ- 
ent elements. By expanding the dimensions, these 
procedures are able to resolve absorptions that overlap 
in two dimensions just as two dimensions separate 
absorptions that overlap in one dimension. Each cross- 
peak in such spectra assigns chemical shifts simultane- 
ously to three or four individual nuclei. 

Each of the cross-peaks in the two-dimensional 
('H-'H) correlated spectrum of basic pancreatic trypsin 
inhibitor (Figure 12-16) and the two-dimensional 
(PN-!H) HSQC correlated spectrum of dihydrofolate 
reductase (Figure 12-17) has been labeled with the posi- 
tion in the sequence ofthe protein ofthe amino acid con- 
taining the two nuclei that produced it. The spectra 
themselves do not come with labels, and each of these 
assignments has been performed by tracing connections 
among cross-peaks that affiliate nuclei in the polypep- 
tide backbone through spin-spin coupling and by 
assigning the type ofeach amino acid from the pattern of 


Table 12-4: Methods for Improving Cross-Peaks in Two-Dimensional and Three-Dimensional Nuclear Magnetic 


Resonance Spectra 


increases amplitude of cross-peaks arising from coupling involving nuclei with 
small magnetogyric ratios such as "carbon and “nitrogen (Table 12-2) 


removes unwanted noise from spectrum 


increases amplitude of cross-peaks and extends, by relaying coherence, the 
number of bonds through which coupling can produce a cross-peak 


extends the number of bonds through which coupling can produce a 


narrows the line widths of the cross-peaks 


replacement of 50% of the 'hydrogens in a protein with “hydrogens at random 
narrows the widths of the cross peaks in correlated spectroscopy and increases 
the amplitude and the number of the detectable nuclear Overhauser effects 
from larger proteins 


narrows the line widths of the cross-peaks 


increases amplitude of cross-peaks arising from coupling involving nuclei with 
small magnetogyric ratios such as "carbon and “nitrogen 


acronym full name improvement 

HMQC?3!2% heteronuclear multiple- 
quantum coherence 

HSQC”? heteronuclear single-quantum 
coherence 

HSMQC™* heteronuclear single-multiple- 
quantum coherence 

DOP double quantum filtered 

MOER! multiple quantum filtered 

HOHAHA**** homonuclear Hartmann-Hahn 

TOCSY”? total correlation spectroscopy 
(same method as HOHAHA) 

HMBC”? heteronuclear multiple bond 
correlation cross-peak 

ps? pseudo single quantum 

random fractional 

deuteration” 

TROSY”” transverse relaxation-optimized 
spectroscopy 

CRINEPT?® cross relaxation insensitive 
nuclei enhanced polarization 
transfer 

SBC% single bond correlation 


permits spin-spin coupling between “carbon and ‘nitrogen to be used for 
two-dimensional spectrum 
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affiliations among cross-peaks that trace connections 
out into each side chain. 

Connections among nuclei along the polypeptide 
backbone are usually traced? in a systematic 
sequence of sections through three-dimensional corre- 
lated spectra (Figure 12-18). The large set of three- 
dimensional spectra available for tracing the 
connections among nuclei (Table 12-3) is redundant, so 
that when connections are obscured by the overlap of 
peaks or when a particular cross-peak is missing (for 
example, Lysine 46 in Figure 12-16), the trace can take an 
alternative path. The result of such a trace is that all of the 
nuclei—'hydrogens, ‘carbons, and nitrogens—in long 
segments of polypeptide backbone are consecutively 
connected to each other and each individually assigned 
a chemical shift. 

The respective positions of these segments of con- 
nected nuclei in the amino acid sequence of the protein 
are then established by tracing connections from the 
nuclei out into each side chain (Figures 12-19 through 
12-22). Each of these paths of connections out into a 
side chain usually starts at one of the cross-peaks in a fin- 
gerprint of the protein. For example, the relayed 


Figure 12-17: Two-dimensional ('H—'°N) 
HSQC correlated nuclear magnetic resonance 
spectrum of a 2 mM solution of human dihy- 
drofolate reductase (na = 186) expressed in 
E. coli grown on [PNINH,CI as its sole source 
of nitrogen and dissolved at pH 6.5 and 
25 °C.”! Each of the spin-spin couplings 
between an amido nitrogen and an amido 
"hydrogen at each of the positions in the back- 
bone of the polypeptide creates one of the 
cross-peaks in the spectrum. The range of 
chemical shift for “nitrogen (vertical axis) 
spans the chemical shifts for the amido 
nitrogens in the protein, and the range of 
chemical shift for 'hydrogen (horizontal axis) 
spans the chemical shifts for amido 'hydro- 
gens in the protein. The two-dimensional 
spectrum is a fingerprint for the protein, and 
each cross-peak assigns pairs of chemical 
shifts to the respective coupled amido "nitro- 
gens and amido ‘hydrogens. Each cross-peak 
is labeled by the amino acid in the sequence of 
the protein to which it has been assigned. The 
pairs of peaks each having the same chemical 
shift in the nitrogen dimension (connected 
by horizontal lines) are those arising from the 
spin-spin coupling of the two respective dis- 
tinct 'hydrogens on each amido nitrogen in 
the primary amides of the glutamines and 
130 asparagines in the protein. They are also 
labeled by the amino acid to which they have 
been assigned. Reprinted with permission 
from ref 251. Copyright 1992 American 

Chemical Society. 
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spin-spin coupling among the ‘hydrogens of each side 
chain registered in a two-dimensional ('H-'H) TOCSY 
spectrum (Figure 12-19)*° begins at the cross-peak 
between the amido ‘hydrogen and the o ‘hydrogen 
(Figure 12-16). The coupling between the a “carbon and 
the B Zearbon of each side chain registered in a three- 
dimensional ('°C-'"N-'H) CBCA(CO)NH correlated spec- 
trum (Figure 12-20)” begins at the cross-peak between 
the amido nitrogen and the amido ‘hydrogen (Figure 
12-17) of the next amino acid in the sequence (Table 
12-3). Connections among ‘hydrogens of a side chain 
(Figure 12-19) or carbons of a side chain (Figure 12-20) 
can be extended to their own ‘carbons or ‘hydrogens, 
respectively, with two-dimensional ('H—-'C) HSQC cor- 
related spectra (Figure 12-21).”°* When two-dimensional 
spectra (Figure 12-19) become too crowded to trace con- 
nections, they can be expanded in a third dimension 
(Figure 12-22)”°° to resolve the individual cross-peaks. 
Each side chain has a characteristic pattern of con- 
nections among nuclei of characteristic chemical shifts 
that identifies it in the spectrum. For example, glutamate 
has two £ ‘hydrogens that have smaller chemical shifts 
than its two o ‘hydrogens (Figure 12-19). Isoleucine has 


‘hydrogens on a 6 methyl group and a ymethyl group as 
well as two ‘hydrogens on a y methylene, all in the 
aliphatic range of chemical shifts (Figures 12-19 and 
12-22). Lysine has ‘hydrogens on a methylene, a 
ymethylene, and a ömethylene in the aliphatic range 
but two ‘hydrogens on an emethylene with chemical 
shifts around 3ppm (Figures 12-19 and 12-22). 
Threonine and serine have HB hydrogens with larger 
chemical shifts (around 4 ppm), and threonine has 
y'hydrogens with chemical shifts characteristic of a 
methyl group (Figures 12-19 and 12-22). From these pat- 
terns, from the sequence of the connections along the 
backbone (Figure 12-18), and from the amino acid 
sequence of the protein, it is usually possible to identify 
the long, unbroken segments of connections among the 
atoms of the backbone that run through the spectra with 
segments in the amino acid sequence of the protein and 
thereby assign the cross-peaks to specific positions in the 
amino acid sequence, much as the pattern of protrusions 
from the polypeptide backbone in a map of electron den- 
sity allows segments of the amino acid sequence to be 
identified. 

The final results of this process are that each cross- 
peak on the various two- and three-dimensional corre- 
lated spectra has been assigned to the two or three nuclei 
in the amino acid sequence of the protein that produce it 
and that the nucleus of almost every ‘hydrogen, 
carbon, and nitrogen in the protein has been assigned 
a specific chemical shift. By themselves these assign- 
ments are not very informative. They are, however, an 
indispensable prelude to using nuclear magnetic reso- 
nance to provide insight into the dynamics of a protein, 
to produce its molecular model, to measure the acid dis- 
sociation constants of its side chains, and to follow the 
rates of exchange of its protons with protons in the solu- 
tion. 

When the spin state of a particular population of 
chemically identical nuclei is saturated by the absorption 
of electromagnetic energy at its Larmor frequency and 
then allowed to relax back to its equilibrium distribution, 
the rate of its relaxation contains information about the 
dynamics of the structure in which each of these nuclei 
is contained. For example, the relaxation of a particular 
population of identical nuclei of amido "nitrogens, each 
at the same position in the polypeptide backbones of the 
identical molecules of a protein in a solution, is domi- 
nated by the dipolar interactions between the nuclei of 
the "nitrogens in that population and the nuclei of the 
directly attached ‘hydrogens on each of the individual 
amido nitrogens. The rate at which the bonds between 
the nitrogens and the ‘hydrogens in those two particu- 
lar populations reorient relative to the magnetic field 
determines the rate of the relaxation of the population of 
‘Snitrogen nuclei. From an analysis of this rate of relax- 
ation, information about the rate of this reorientation 
can be extracted.*”®! This information is expressed as 
an order parameter S°, which assumes a value of 1 when 
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the '*nitrogen—'hydrogen bond is fixed rigidly in the pro- 
tein so that it reorients at the same rate at which the 
entire molecule of protein reorients by its normal rota- 
tional and translational diffusion and which assumes a 
value of 0 when it is so loosely attached to the protein 
that it reorients completely independently of the reori- 
entation of the molecule of protein. Therefore, the order 
parameter S° is a measure of the flexibility of a particular 
position within a molecule of the protein. 

Two-dimensional spectra of proteins in which each 
amido ‘nitrogen in the polypeptide backbone has been 
assigned its position in the amino acid sequence can be 
used to measure relaxation rates of these individual 
‘Snitrogens and calculate values of the order parameter 
S° for each. Most of the values of the order parameter S° 
for these “nitrogens are near 1 (20.8) because most of the 
polypeptide backbone of a molecule of protein is rigidly 
fixed within the tertiary structure, but there are flexible 
segments that can be identified by the smaller values 
(0.4-0.6) of the order parameter S° of the amido "nitro- 
gens they contain." These segments are often flexible 
loops on the surface of the molecule of protein and cor- 
respond to segments in a crystallographic map of elec- 
tron density that are so flexible that they do not appear in 
the map or to segments the atoms of which have high 
B-factors.”* High values of these B-factors indicate that 
the segment is also flexible in the crystal. The informative 
exceptions are those in which the order parameters are 
low but the segment appears to be rigid in the crystallo- 
graphic molecular model and its constituent atoms have 
low B-factors. These exceptions indicate that the crystal 
packing has confined an otherwise flexible segment of 
polypeptide. 

Order parameters can also be obtained for bonds 
between “nitrogens and ‘hydrogens in side chains. For 
example, the order parameters for bonds between 
‘Snitrogens and ‘hydrogens in the side chains of trypto- 
phans buried in the core of the protein are usually greater 
than 0.8, but those for the bonds between Bnitrogens and 
Ihydrogens in the side chains of arginines on the surface 
of a protein can be as small as 0.05.°® Unfortunately, 
there is no simple correlation between values of the 
order parameter S° and the rates at which the flexible 
segments or two side chains are fluctuating relative to 
the entire molecule, and motions slower than the rota- 
tional diffusion of the entire molecule of protein do not 
register in the order parameter. 

In the rare instances in which a flexible segment of 
the folded polypeptide assumes only two significantly 
occupied conformations of about equal occupancy, each 
nucleus in the flexible segment will have a different 
chemical shift in each conformation. If these chemical 
shifts are different enough, each pair of nuclei that is 
spin-spin-coupled will produce two cross-peaks in a 
two-dimensional spectrum, each with the chemical 
shifts of the respective nuclei in the two respective con- 
formations.” In such cases, it is possible to obtain a rate 
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Figure 12-18: Sequential assignment of the chemical shifts for the ‘hydrogens, “carbons, and "nitrogens in a polypeptide backbone by use 
of three-dimensional nuclear magnetic resonance spectra. 213 The complementary DNA encoding calmodulin from D. melanogaster (Naa = 
148) was expressed in E. coli grown on [PNINH,CI and [ Bc,lglucose as sole nitrogen and carbon sources, so that the protein was uniformly 
and | completely (95%) labeled with nitrogen and "carbon. Spectra were recorded from a 1.5 mM solution of calmodulin in a 93:7 mixture 
off! HIH-O to k H]H,0 at pH 6.3 and 47 °C. Individual panels (A-H) are two-dimensional sections through a series of three-dimensional 
nuclear magnetic resonance spectra (Table 12-3) of the protein. In panels A, B, and C, the sections through the respective three-dimensional 
spectra are only wide enough t to contain cross-peaks from "nitrogens the chemical shifts of which are 124.4 + 0.2 ppm, Which includes the 
chemical shift of the amido "nitrogen of Lysine 21. Each of these three sections has as its vertical axis the chemical shift of ‘hydrogen between 
7.4 and 8.5 ppm, the region covering the chemical shifts of ‘hydrogens on amido nitrogens. (A) Section containing the cross- peak produced 
by the spin-spin coupling connecting the amido 'hydrogen of Lysine 21, the amido nitrogen of Lysine 21, and the acyl carbon of Aspartate 
20. This ‚section has as its horizontal axis the chemical shift for “carbon between 172 and 180 ppm, the region covering the chemical shifts 
of acyl carbons. This cross-peak is located by the chemical shift of the acyl “carbon of Aspartate 20 (177.3 ppm) and assigns the chemical 
shift of the amido ‘hydrogen of Lysine 21 as 7.66 ppm. (B) Section containing the cross-peak produced by the spin-spin coupling connect- 
ing the amido ‘hydrogen, the amido nitrogen, and the o “carbon of Lysine 21. The section has as its horizontal axis the chemical shift for 

carbon between 47 and 64 ppm, the region covering the chemical shifts of the o carbons. The position of this cross-peak, located by the 
value for the chemical shift of its amido ‘hydrogen (horizontal line), assigns the chemical shift of the æ "carbon of Lysine 21 as 58.5 ppm. 
(C) Section containing the cross-peak produced by the spin-spin coupling connecting the amido ‘hydrogen, the amido nitrogen, and the 
o 'hydrogen of Lysine 21. This section has as its horizontal axis the chemical shift for "hydrogen between 3.7 and 4.9 ppm, the region cover- 
ing the chemical shifts of œ hydrogens. The position of this cross-peak, located by the value of the chemical shift of its amido ‘hydrogen (hor- 
izontal line), assigns the chemical shift of the o hydrogen of Lysine 21 as 4. 01 ppm. In panels D and E, the sections through the respective 
three-dimensional spectra are only y wide enough to contain cross-peaks from carbons the chemical shifts of which are 58.3 +0.3 ppm, which 
includes the chemical shift of the “carbon of Lysine 21 (assigned in panel B). Each of these two sections has as its horizontal axis the chem- 
ical shift for ‘hydrogen between 3.7 and 4.9 ppm, the region covering the chemical shifts of a ‘hydrogens. (D) Section containing the cross- 
peak produced by the spin-spin coupling connecting the o ‘hydrogen, the «carbon, and the acyl "carbon of Lysine 21. „This section has as 
its vertical axis the chemical shift for carbon between 174 and 181 ppm, the region covering the chemical shifts of acyl “carbons. The posi- 
tion of this cross-peak, located by the value of the chemical shift of its æ 'hydrogen (vertical line), assigns the chemical shift of the acyl 

carbon of Lysine A as 178.3 ppm. (E) Section containin: ng t the cross-peak produced by the spin-spin coupling connecting the œ ‘hydrogen 
of Lysine 21, the œ “carbon of Lysine 21, and the amido “nitrogen of Aspartate 22. This. section has as its vertical axis the chemical shift for 

IShitrogen between 111 and 125 ppm, the region covering the chemical shifts of amido "nitrogens. The position of the cross-peak for Lysine 
21, located by the value of the chemical shift of its o hydrogen (vertical line), assigns the chemical shift of the amido “nitrogen of Aspartate 
22 as 114.0 ppm. (F-H) Sections through three-dimensional spectra containing cross-peaks for Aspartate 22 corresponding respectively to 
the sections i in panels A-C that contain cross-peaks for Lysine 21. The sections in panels F, G, and H are only wide enough to contain Cross- 
peaks from "nitrogens, the chemical shifts of which are 114.1 + 0.2 ppm, which includes the chemical shift of the amido nitrogen of 
Aspartate 22 assigned in panel E. The value of the chemical shift of the amido ‘nitrogen of Lysine 21 (124.4 ppm) that was used to set the 
position of the slabs in panels A, B, and C was assigned with a section corresponding to panel E but with the section for the sequential assign- 
ment of the chemical shifts of the nuclei in Aspartate 20 rather than Lysine 21. The cross-peak in panel F produced by spin-spin couplings 
connecting the amido ‘hydrogen of Aspartate 22, the amido “nitrogen of Aspartate 22, and the acyl “carbon of Lysine 21 was located with 
the chemical shift of the acyl “carbon of Lysine 21 assigned in panel D. The position of the cross-peak from Lysine 21 in panel A was located 
with the chemical shift of the acyl carbon of Aspartate 20 assigned in a section for the sequential assignment of the nuclei in Aspartate 20 
corresponding to that in panel D for Lysine 21. Cross-peaks from Leucine 116 appear in panels A, B, and C because the chemical shift of its 
amido ‘nitrogen is 124.2 ppm. Cross-peaks from Leucine 32, Arginine 74, Glutamate 82, Leucine 105, Aspartate 118, and Glutamate 127 
appear in panels D and E because the chemical shifts of their œ *carbons are 58.2, 58.1, 58.2, 58.5, 58.5, and 58.5 ppm, respectively. Cross- 
peaks from Aspartate 58, Threonine 79, Aspartate 95, and Threonine 110 appear in panels F, G, and H because the chemical shifts of their 
amido "”nitrogens are 113.9, 114.0, 114.2, and 114.4 ppm, respectively. Reprinted with permission from ref 213. Copyright 1990 American 
Chemical Society. 


constant for the exchange of the flexible segment 
between the two conformations. For example, a loop 
between Alanine 9 and Leucine 24 in dihydrofolate 
reductase from E. coli exchanges between its two con- 
formations”® at a rate of 35 e The heterologous associ- 
ation between two molecules of protein also causes 
changes in the chemical shifts of nuclei that end up in the 
interface. These changes produce pairs of cross-peaks, 
one from the unassociated protein and one from the 
associated protein. These changes in chemical shift iden- 
tify the amino acids involved in the interface,” and rates 
of exchange of the participants between free and bound 
states can be calculated from such spectra.?°” 

A nuclear magnetic resonance molecular model of 
a protein is produced from a list of the individual nuclear 
Overhauser effects between its ‘hydrogens. The thou- 
sands of nuclear Overhauser effects that occur between 
pairs of the thousands of unique 'hydrogens in a protein 


are resolved from each other by using two-dimensional 
and three-dimensional nuclear Overhauser enhanced 
spectroscopy (NOESY) just as the individual absorptions 
of each of the thousands of nuclei in a protein are 
resolved by using two-dimensional and three-dimen- 
sional correlated spectroscopy. 

A two-dimensional nuclear Overhauser enhanced 
spectrum is a two-dimensional spectrum in which the 
off-diagonal cross-peaks arise from nuclear Overhauser 
effects between two different populations of hydrogen 
nuclei and identify by their chemical shifts those pairs of 
‘hydrogen nuclei that are connected by those respective 
nuclear Overhauser effects. A two-dimensional nuclear 
Overhauser enhanced spectrum (Figure 12-23)? is 
produced in the same way as a two-dimensional corre- 
lated spectrum, except that after the second 90° pulse 
has labeled the precession of the net magnetization of 
each population of nuclei with its Larmor frequency by 
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modulating the amplitude of its precession in the 
xy plane (Figure 12-15), there is a fixed delay or time of 
mixing, fm, of 50 ms to several hundred milliseconds to 
allow the saturation of each population of nuclei, which 
has been labeled with its own characteristic Larmor fre- 
quency, to diffuse outward, mixing with the spin states of 
nuclei in its vicinity. After this fixed delay, a third 90° 
pulse initiates the collection of the free induction decay 
from the sample during fp. 

In the two-dimensional spectrum that results, there 
is a cross-peak produced by the transfer of some of the 
saturation from the population of nuclei A, which has 
been labeled with its own Larmor frequency, to the pop- 
ulation of nucleiX, each of which is adjacent to a 
nucleus A in a molecule of the protein. As a result, the 


Figure 12-19: Two-dimensional (‘H-'H) TOCSY nuclear magnetic 
resonance spectrum of a 2 mM solution of ferrocytochrome c, from 
Rhodobacter capsulatus (nj, = 116) dissolved in a 90:10 mixture of 
(H]H,O and PH]H;O, respectively, at pH 6 and 30 °C.” The spec- 
trum contains peaks resulting from the relay of spin-spin coupling 
from the amido 'hydrogen of each amino acid out along the atoms in 
its side chain. Each cross-peak is labeled with the letter ofthe Greek 
alphabet designating the carbon of the side chain on which the 
Ihydrogen producing it resides. Each sequence of cross-peaks result- 
ing from these relayed spin-spin couplings (verticallines) is anchored 
on the cross-peak connecting the spins of the amido 'hydrogen and 
the a'hydrogen of an amino acid (Figure 12-16). Each sequence 
identifies cross-peaks arising from relayed coupling between the 
amido 'hydrogen and other hydrogens in the side chain ofthat amino 
acid and assigns chemical shifts (the values of the ordinate of each 
cross-peak) to each ofthose other hydrogens. For example, the chem- 
ical shifts of the o ß, %, and 63 "hydrogens of Isoleucine 19 are 4.12, 
1.42, 0.66, and -0.22 ppm, respectively, and those of the o ‘hydrogen 
and the two £ 'hydrogens of Cysteine 13 are 4.90, 1.67, and 0.67 ppm, 
respectively. Each cross-peak between an amido ‘hydrogen and an 
o hydrogen anchoring each of the relayed connections is labeled 
with the amino acid on which they reside. Reprinted with permission 
from ref 256. Copyright 1990 American Chemical Society. 


population of nuclei X becomes labeled with the Larmor 
frequency of the population of nuclei A as well as its own 
Larmor frequency, and in the two-dimensional nuclear 
Overhauser enhanced spectrum that results, there is an 
off-diagonal cross-peak at chemical shift 6, of nucleus X 
and chemical shift 6, of nucleus A. Transfer of saturation, 
however, is reciprocal, and the saturation reciprocally 
and coincidentally transferred from the population of 
nuclei X to the population of nuclei A, which is labeled 
with the Larmor frequency of nuclei X, produces an off- 
diagonal peak at chemical shift 6, of nucleus A and 
chemical shift 6, of nucleus X. Each of these two sym- 
metrically displayed peaks connects nucleusA and 
nucleus X by a nuclear Overhauser effect and identifies 
the two nuclei connected by their chemical shifts. Each 
of the hundreds of symmetrically displayed peaks in a 
nuclear Overhauser enhanced spectrum (Figure 12-23) 
makes its own respective connection between two 
‘hydrogens, each identified by its chemical shift. 

A specific example illustrates the spin diffusion 
resulting from these individual transfers of saturation. A 
cross section through a two-dimensional nuclear 
Overhauser enhanced spectrum of bovine acrosin 
inhibitor IIA, at the chemical shift 6, equivalent to the 
Larmor frequency of the amido ‘hydrogen of Alanine 37 
(8.45 ppm), contains cross-peaks at the chemical shifts 
ô, of the amido ‘hydrogens of Asparagine 34, Cystine 36, 
and Phenylalanine 38; of the a ‘hydrogens of Cystine 36 
and Alanine 37; and of the ß ‘hydrogens of Cystine 36 and 
Alanine 37 (Figure 12-24)” because the saturation 
transferred to each of these populations of ‘hydrogen 
nuclei retains the amplitude modulation, labeling it with 
the Larmor frequency of the population of the nuclei of 
the amido ‘hydrogens of Alanine 37. Consequently, in 
the dimension of chemical shift 6, cross-peaks are 
located at the chemical shifts of these other populations 
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Figure 12-21: Two-dimensional (HHC HSQC correlated 
nuclear magnetic resonance spectrum of a solution of human 
transforming growth factor D (homodimer of subunits 112 aa in 
length) expressed in Chinese hamster ovarian cells grown on a mix- 
ture of [‘4C]amino acids and dissolved in a 95:5 mixture of [H]H,0 
and PH]H,O, respectively, at pH 4.2 and 45 °C. The region ofthe 
spectrum presented covers the range of chemical shift for 'hydro- 
gen (horizontal axis) and “carbon (vertical axis) of the methyl 
groups of threonine, valine, leucine, and isoleucine. Each cross- 
peak is produced by the spin-spin coupling between a methyl 
carbon and its three ‘hydrogens. Each is labeled with the amino 
acid in the sequence of the protein to which it has been assigned 
and the Greek letter designating the position of the methyl group 
within that amino acid. The inset is the boxed region within the full 
spectrum expanded in the dimension of the chemical shift of 
Scarbon to resolve peaks that overlapped in the full spectrum. 
Reprinted with permission from ref 258. Copyright 1996 American 
Chemical Society. 
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Figure 12-20: Strips from sections of a three-dimensional ("’C-"N-,H) 
CBCA(CO)NH nuclear magnetic resonance spectrum of a 1 mM solu- 
tion of human interleukin-13 (naa = 113) expressed in E coli grown on 
[PN] (NH,),SO, and e glucose as sole sources of nitrogen and carbon 
and dissolved in a 90:10 mixture of [H]H,O and [PH]H;O, respectively, 
at pH 6.0 and 25 °C.” Each section through the three-dimensional 
spectrum from which each strip is taken is centered on the chemical 
shift of the amido ‘nitrogen of the respective amino acid. A strip is then 
taken from the resulting two-dimensional section. Each strip is aligned 
vertically with its neighbors and is designated at its bottom by the amino 
acid from which the pair of cross-peaks (connected by vertical lines) 
arises. Each strip is centered on the chemical shift of the amido 'hydro- 
gen of the next amino acid in the sequence of the protein (Table 12-3) 
so the horizontal axis of each strip is the chemical shift of hydrogen. 
Each strip is wide enough to include the cross-peak produced by the 
spin-spin coupling connecting the o carbon of the amino acid, the 
amido ‘nitrogen of the next amino acid and the amido ‘hydrogen of the 
next amino acid as well as the cross-peak produced by the spin-spin 
coupling connecting the ß "carbon of the amino acid, the amido "nitro- 
gen of the next amino acid, and the amido ‘hydrogen of the next amino 
acid. The two cross-peaks in each strip appear in the same section and 
have the same value on the horizontal axis because they are both cou- 
pled to the same respective amido nitrogen and amido ‘hydrogen on 
the next amino acid. Therefore, each strip is anchored on a cross-peak 
in the two-dimensional ('H—'°N) HSQC correlated nuclear magnetic res- 
onance spectrum (Figure 12-17) providing the fingerprint of the pro- 
tein, albeit on the cross-peak of the next amino acid, and each 
cross-peak assigns a value for the chemical shifts of the a carbon and 
the 6 carbon of the amino acid preceding the amino acid producing 
the cross-peak in the fingerprint. Reprinted with permission from ref 
257. Copyright 2001 Elsevier B.V. 
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of ‘hydrogens. The existence of these cross-peaks states 
that all of these ‘hydrogens are in the vicinity of the 
amido ‘hydrogen of Alanine 37 in the tertiary structure of 
the protein. As the length of the fixed delay Le was 
increased, the intensity of the cross-peaks increased as 
more and more of the amplitude modulation of the pop- 
ulation of the nuclei of amido ‘hydrogens of Alanine 37 
was transferred to the populations of neighboring nuclei. 

Two problems with nuclear Overhauser effects 
that are the consequence of spin diffusion are that they 
are complicated by the spectral density function?” inher- 
ent to the dipolar interaction and that they are usually 
not confined to nuclei immediately adjacent to the 
source of the diffusing spin but spread outward from the 
source in rather complex pathways that cannot be delin- 
eated unless the detailed structure of the molecule is 
already known.””°*"!?”? The time tm between the second 
and the third 90° pulses must be chosen by trial and 


Figure 12-22: Sections from three-dimensional (!H-"°C-!H) 
HCCH-TOCSY nuclear magnetic resonance spectra of a2 mM solu- 
tion of human interleukin-1 $ (naa = 153) expressed in E coli grown 
on [PNINH,CI and ["Cslglucose as the sole sources of nitrogen and 
carbon and dissolved in [“H]H,0 at pH 5.4 and 36 °C, Each section 
through the three-dimensional spectrum is 0.6-0.8 ppm in width 
centered on the carbon chemical shift noted in its upper left corner. 
These chemical shifts serve to identify each section. Each cross-peak 
is produced by the spin-spin coupling of two hydrogens on the side 
chain of a particular amino acid often relayed through multiple 
bonds. One of the two hydrogens is on the carbon providing the 
third dimension. The sections through the three-dimensional spec- 
trum centered on chemical shifts for “carbon of 17.6 and 22.2 ppm, 
respectively, contain cross-peaks arising from the absorptions of the 
carbons in the 6 and y positions of leucines, the $ positions of 
valines, and the ß position of an alanine with chemical shifts in each 
of these ranges. The cross-peak produced by the self-coupling of the 
"hydrogen on each of these selected “carbons lies on the diagonal. 
Every cross-peak along each horizontal line is coupled, respectively, 
to this “carbon and ‘hydrogen, each labeled with its eventual assign- 
ment. Each cross-peak is labeled by the position of the other coupled 
hydrogen in the side chain, and each cross-peak assigns the chemi- 
cal shift of that other hydrogen. The sections through the three- 
dimensional spectrum centered on chemical shifts for "carbon of 
39.9 and 40.9 ppm, respectively, contain cross-peaks arising from the 
absorptions of the carbons in the £ positions of isoleucines with 
chemical shifts in each of these ranges. The cross-peaks produced by 
the self-coupling of the ‘hydrogens on each of these selected ß car- 
bons lie on the diagonal. Every cross-peak along each horizontal line 
is coupled, respectively, to this ß"’carbon and ß'hydrogen, each 
labeled with its eventual assignment. Each cross-peak is labeled by 
the position of the other coupled hydrogen in the side chain, and 
each cross-peak assigns the chemical shift of that other hydrogen. 
The sections through the three-dimensional spectrum centered on 
chemical shifts for carbon of 55.0, 56.7, 58.0, and 58.6 ppm, respec- 
tively, contain cross-peaks arising from the absorptions of the ear. 
bons in the a positions of amino acids with chemical shifts in each of 
these ranges. The cross-peaks produced by the self-coupling of the 
alhydrogens on each of these selected o carbons lie on the 
diagonal. Every cross-peak along each horizontal line is coupled, 
respectively, to this o carbon and o "hydrogen, each labeled with its 
eventual assignment. Each cross-peak is labeled by the position of 
the other coupled hydrogen in the side chain, and each cross-peak 
assigns the chemical shift of that other hydrogen. Reprinted from 
ref 259. Copyright 1990 American Chemical Society. 


error to maximize the amount of transfer to immediately 
adjacent nuclei while minimizing the spread to more dis- 
tant locations (Figure 12-24). Because the nuclear 
Overhauser effect results from a dynamic, inhomoge- 
neous process, no reliable absolute measurements of 
particular distances between nuclei can be made. An 
intuition of relative distances between the nuclei can be 
gained, however, by following the changes in the inten- 
sity of the nuclear Overhauser effects as a function of the 
time interval ¢,,. Ifa nuclear Overhauser effect is one that 
develops early in the progress of spin diffusion, the two 
nuclei connected by that nuclear Overhauser effect are 
presumed to be close to each other (<0.5 nm) in the 
folded polypeptide.?” 

In the full two-dimensional nuclear Overhauser 
spectrum of a protein (Figure 12-23A), the one-dimen- 
sional spectrum of the individual absorptions of the 
‘hydrogens lies along the diagonal. The nuclear 


Figure 12-23: Two-dimensional (‘H-'H) nuclear Overhauser 
enhanced spectra. (A) Full spectrum of a solution of ribosomal pro- 
tein S17 from Bacillus stearothermophilus (n,a = 86) dissolved in a 
90:10 mixture of [H]H,O and [PH]H;O, respectively, at pH 6.5 and 
25 °C." In this spectrum, a cross-peak appears whenever the 
absorptions of two ‘hydrogens are connected by a nuclear 
Overhauser effect. The two chemical shifts of each cross-peak are 
those of the two respective 'hydrogens. Because most 'hydrogens 
on adjacently bonded atoms are close enough to be connected by 
a nuclear Overhauser effect as well as spin-spin coupled through 
the bonds, most of the cross-peaks in a correlated spectrum of the 
protein are also present here. There are, however, more cross- 
peaks. The additional ones connect hydrogens on atoms adjacent 
in space but not connected by covalent bonds. Several regions are 
highlighted within boxes on the full spectrum: du a, connections 
between ‘hydrogens on different amido initrogens; dan, connec- 
tions between o ‘hydrogens and amido ‘hydrogens; dipan Con- 
nections between hydrogens on ß, % or öcarbons and amido 
‘hydrogens; d, connections between ‘hydrogens on two different 
œ carbons; and dn aa, connections between two different £, 7, 
orö Ihydrogens. Reprinted from ref 269. Copyright 1996 American 
Chemical Society. (B) Expansion of the du region of spectrum A. 
Each cross-peak is a connection produced by a nuclear Overhauser 
effect between the o ‘hydrogen on one amino acid and the amido 
‘hydrogen on another. The identity of the amino acids on which the 
paired ‘hydrogens are located is determined by the chemical shifts 
at which the cross-peak is situated. Those that connect consecutive 
amino acids in the sequence of the protein are labeled to illustrate 
the fact that most but not all of the nuclear Overhauser effects in 
this region are local and uninformative. Reprinted from ref 269. 
Copyright 1996 American Chemical Society. (C) Spectrum of a 
2 mM solution of the ycarboxyglutamate-rich domain of human 
factor IX (naa = 47) dissolved in a 90:10 mixture of [‘H]H,O and 
[PH]H;O, respectively, at pH 5.3 and 35 oC, The region of the 
spectrum displayed contains cross-peaks between 'hydrogens on 
aromatic rings (horizontal axis) and 'hydrogens on methyl groups 
(vertical axis). Because aromatic amino acids have no methyl 
groups, most of the cross-peaks in this region, unlike those in panel 
B, connect amino acids distant from each other in the sequence of 
the protein. The cross-peaks are identified in the respective dimen- 
sions by the number of and position in the amino acids containing 
the two 'hydrogens. Reprinted with permission from ref 268. 
Copyright 1995 American Chemical Society. 


Overhauser effect draws connected pairs of these 
absorptions out of the diagonal as individual cross- 
peaks. Various areas of the spectrum contain cross-peaks 
between particular classes of "hydrogens (the dashed 
boxes in Figure 12-23A), such as those connecting 
a'hydrogens and amido ‘hydrogens in the polypeptide 
backbone (Figure 12-23B) or those connecting aromatic 
‘hydrogens and aliphatic ‘hydrogens (Figure 12-23C). 
Proteins with more than 100 amino acids have so 
many pairs of nearby ‘hydrogens that two-dimensional 
nuclear Overhauser enhanced spectra become too 
crowded. Many individual peaks overlap and are impos- 
sible to resolve and identify. In such cases, three-dimen- 
sional (Figure 12-25)” or even four-dimensional*”**” 
nuclear Overhauser enhanced spectra are used to 
increase the resolution. In sections from such spectra, 
only those pairs of 'hydrogens that are connected by a 
nuclear Overhauser effect and in which one of the pair is 
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Figure 12-24: Diffusion of saturation, labeled with its Larmor fre- 
quency, from a nucleus of ‘hydrogen into surrounding nuclei of 
‘hydrogen as a function of the length of the fixed delay tm. ° Each 
trace is a cross section through a two-dimensional ('H-'H) nuclear 
Overhauser enhanced spectrum of a 16 mM solution of acrosin 
inhibitor IIA from bovine seminal plasma (n, = 57) in ['H]H;O at 
pH 5.3 and 47 °C. Each cross section cuts through one of the two- 
dimensional spectra at a chemical shift ô; of 8.45 ppm, which is the 
chemical shift of the amido hydrogen on Alanine 37. Other ‘hydro- 
gens connected to this hydrogen by nuclear Overhauser effects are 
represented by cross-peaks in the dimension of 'hydrogen chemi- 
cal shift ö, (horizontal axis). They are labeled by the 'hydrogen to 
which they have been assigned by the individual values of their 
chemical shifts. The large peak labeled A37NH is the self-connec- 
tion of the amido ‘hydrogen of Alanine 37. Each cross section is 
from a two-dimensional spectrum gathered with a different fixed 
delay ftm, noted to its left in milliseconds. Reprinted with permis- 
sion from ref 270. Copyright 1985 Academic Press. 


spin-spin-coupled to a “carbon or a ‘nitrogen the 
chemical shift of which falls within a narrow range of 
values are registered. For example, only the ‘hydrogens 
connected by nuclear Overhauser effects to hydrogens 
on carbons with chemical shifts of 13.3 ppm are regis- 
tered in the left strip in Figure 12-25. The nuclear 
Overhauser effects in such spectra are assigned to partic- 
ular pairs of hydrogens on the basis of the two chemical 
shifts of each cross-peak (Figure 12-23B,C) or the two 
chemical shifts of the cross-peak and the chemical shift 
of a “carbon or nitrogen to which one or the other of 
the ‘hydrogens is spin-spin-coupled. These chemical 
shifts were determined during the initial assignments of 
chemical shifts to all of the nuclei in the protein. 

As many pairs of hydrogens coupled by nuclear 
Overhauser effects as possible are identified and cata- 
logued. For example, 531 pairs of ‘hydrogens were 
identified as being connected by nuclear Overhauser 
effects in spectra of the major cold-shock protein 
(naa = 70) from E. coli;’’° 1281 pairs, in spectra of glutare- 
doxin 2 (naa = 215) from E coli?” and 3125 pairs, in 
spectra of phosphoglycerate mutase (naa = 205) from 
Schizosaccharomyces pombe.’ As is usually the case, 
these connections were spread unevenly over the amino 
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Figure 12-25: Three-dimensional (‘H-'C-'H) NOESY-HMQC 
spectra of a 2 mM solution of human interleukin-4 (naa = 129) that 
had been expressed in E coli grown on [PNI(NH,)SO, and 
('8C3lglycerol as sole sources of nitrogen and carbon and that was 
dissolved in PH]H,0 at pH 4.5 and 20 °C.?3 The three strips are 
from two-dimensional sections through the three-dimensional 
spectra. The three two-dimensional sections, cut in the carbon 
dimension, contain cross-peaks from carbons with chemical 
shifts of 13.3, 16.6, and 6.7 ppm, respectively, which are the chem- 
ical shifts of the carbons in the ô methyl group of Isoleucine 10, 
one of the ymethyl groups of Valine 29, and the 6 methyl group of 
Isoleucine 80. Each strip from each of these two-dimensional sec- 
tions is centered on the chemical shift (horizontal axis) of the 
‘hydrogens of the respective methyl group. The most intense cross- 
peak on each strip is the self-connection of those hydrogens and is 
not labeled. The vertical axis defines the chemical shifts of the other 
hydrogens connected to the respective methyl hydrogens by 
nuclear Overhauser effects. Each of these cross-peaks, identified by 
its chemical shift, is labeled with the amino acid and the position 
in that amino acid at which the ‘hydrogen producing the nuclear 
Overhauser effect is located. Reprinted from ref 273. Copyright 
1994 Elsevier B.V. 


acid sequences of these proteins with as few as 5-10 
involving ‘hydrogens on one particular amino acid to as 
many as 160 involving ‘hydrogens on another.’” The 
latter values are extraordinarily impressive because no 
one hydrogen in a protein can have more than about 
20-25 hydrogens within 0.5 nm of it, and the hydrogens 
on methyl groups are indistinguishable from each other 
in chemical shift and do not register as separate 'hydro- 
gens. It is from this catalogue of pairs of connected 
"hydrogens that a molecular model is built. 

It is the nuclear Overhauser effects between ‘hydro- 
gens that are on amino acids two or more positions away 
from each other in the amino acid sequence of a protein 
that provide the information on which the molecular 
model is based. Nuclear Overhauser effects between 
‘hydrogens within the same amino acid or on immedi- 
ately adjacent amino acids are usually uninformative 
because the covalent structure requires that they occur. 


Usually, only about 50% of the assigned nuclear 
Overhauser effects arise from ‘hydrogens that are on 
amino acids two or more positions away from each other 
in the amino acid sequence of a protein.”’°”” Certain 
regions of a two-dimensional nuclear Overhauser 
enhanced spectrum, such as the one containing connec- 
tions between o ‘hydrogens and amido ‘hydrogens, are 
dominated by pairs of ‘hydrogens that are on amino 
acids at adjacent positions in the amino acid sequence 
(labeled cross-peaks in Figure 12-23B), while other 
regions, such as the one containing connections between 
aromatic ‘hydrogens and aliphatic ‘hydrogens (labeled 
cross-peaks in Figure 12-23C), are dominated by pairs of 
‘hydrogens distant from each other in the primary struc- 
ture but adjacent to each other in the tertiary structure of 
the protein. It is these latter types of nuclear Overhauser 
effects that draw together two distant hydrogens as the 
covalent structure of the protein is folded into the molec- 
ular model. 

A nuclear magnetic resonance molecular model is 
a molecular model of the covalent structure of the protein 
folded by the builder of the model into a tertiary structure 
in which the maximum number of 'hydrogens observed 
to be connected by short-range nuclear Overhauser 
effects end up close (<0.5 nm) to each other. It is possible 
to start with a molecular model of the extended polypep- 
tide in a computer and use molecular dynamics and sim- 
ulated annealing, modified so that the potential function 
includes the constraints of the nuclear Overhauser 
effects, to produce a preliminary molecular model, simi- 
lar to the preliminary crystallographic molecular model 
that results from inserting the molecular model of the 
polypeptide into the map electron density. 

The validity of this initial molecular model can be 
assessed by examining its secondary structure. Just as 
æ helices and f structure can be recognized in a map of 
electron density, segments of the polypeptide that are 
a helices or D structure in the actual molecule of protein 
can be recognized by patterns in the observed nuclear 
Overhauser effects (Figure 12-26). 

The most dominant pattern is that of the 'hydro- 
gens in an @ helix. In an «helix, nuclear Overhauser 
effects systematically connect hydrogens in each amino 
acid to those in amino acids three positions and four 
positions (Figure 6-6) away from it in the amino acid 
sequence. -# An ahelix holds the consecutive nitro- 
gen-hydrogen bonds of the amides of the backbone close 
to each other, and these short distances promote trans- 
fer of saturation. For example, in the two-dimensional 
nuclear Overhauser enhanced spectrum of the anaphyla- 
toxin from human complement factor 3a, 36 of the 44 
nuclear Overhauser effects observed between amido 
‘hydrogens on spatially adjacent amino acids were those 
for amino acids in «helices, while only 46 of the 77 
amino acids in the protein are in œ helices.”® 

In parallel $ structure, ‘hydrogens in a string of 
successive amino acids in the sequence are connected in 
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Position in sequence 


Position in sequence 


Figure 12-26: Diagonal plot of the nuclear Overhauser effects 
observed between different amino acids in the amino acid 
sequence of acrosin inhibitor IIA from bovine seminal plasma.” 
Each nuclear Overhauser effect was established by the existence of 
a cross-peak in a two-dimensional nuclear Overhauser enhanced 
spectrum of the protein that had two chemical shifts identical to 
the respective chemical shifts of particular hydrogens on the two 
different amino acids. The two axes are the numbering of the 
amino acids in the sequence of the protein, and a square represents 
a connection between the two positions in the sequence. Solid 
squares represent nuclear Overhauser effects between 'hydrogens 
on ocarbons or amido ‘hydrogens from the respective amino 
acids; hatched squares, between a "hydrogen on an acarbon or 
amido ‘hydrogen on one amino acid and a ‘hydrogen on the side 
chain of the other; squares with x, between "hydrogens on the side 
chains of both amino acids. Patterns of connections can be recog- 
nized in the plot that define three turns of o helix from positions 34 
to 45 and one turn of æ helix from positions 8 to 11, and three seg- 
ments from positions 52 to 55, 27 to 23, and 29 to 33 define three 
strands of antiparallel ß structure in a pleated sheet. Reprinted with 
permission from ref 270. Copyright 1985 Academic Press. 


pairs to "hydrogens in another string of successive amino 
acids in the order in which those spectrally connected 
pairs of amino acids occur in the sequence. In anti- 
parallel ß structure, the pairs of amino acids are con- 
nected in the reverse order to the order in which they 
occur in the sequence. For example, connections 
between amino acids at positions 20 and 17, 21 and 16, 
22 and 15, 23 and 14, 24 and 13, and 25 and 12 defined an 
antiparallel 8 hairpin in a-amylase inhibitor HOE-467A 
from Streptomyces tendae.”* 

Other patterns indicating the organization of the 
tertiary structure of the actual molecule of protein can 
also be recognized in the spectra. For example, patterns 
similar to those for antiparallel 8 structure, in which 
‘hydrogens in one segment of amino acids are connected 
in reverse order to ‘hydrogens in another segment of 
amino acids, can identify two adjacent, antiparallel 
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æ helices. In this instance, however, the patterns are dis- 
continuous because only ‘hydrogens within the interface 
between the two «helices are connected by nuclear 
Overhauser effects. 7 

Just as the preliminary crystallographic molecular 
model is then submitted to refinement against the data 
set, the preliminary nuclear magnetic resonance molec- 
ular model is submitted to refinement with the nuclear 
Overhauser effects as constraints. In addition to nuclear 


Overhauser effects, other constraints can be applied to 
the process ofrefinement. Designated donors and accep- 
tors of hydrogen bonds in o helices and £ structure can 
be assigned ideal lengths.” Dihedral angles within the 
structure can be constrained to particular ranges by 
values of observed coupling constants.”**””” If the protein 
contains a paramagnetic metallic cation, the effect of 
that cation on the relaxation rates of hydrogens in the 
protein can provide estimates of the distances between 
each of those ‘hydrogens and the metallic cation.””! 

In the final refined nuclear magnetic resonance 
molecular model, segments of random meander con- 
necting clearly defined segments of secondary structure 
will often be less well defined even though there is crys- 
tallographic or chemical evidence that they do assume 
specific structures "777 This problem is emphasized by 
the practice of presenting a nuclear magnetic resonance 
molecular model as an ensemble of structures, each of 
which satisfies the constraints of the nuclear Overhauser 
effects (Figure 12-27).2 In such representations, the 
certainty of the structures of o helices and ß structure is 
set in sharp contrast to the uncertainty of the structure of 
the random meander. Although such a representation 
implies that these poorly defined segments are more 
flexible and less rigidly confined than the central regions 
of regular secondary structure, crystallographic molecu- 
lar models of the same protein often show no evidence of 
such flexibility.” It is possible to distinguish whether or 
not the poor definition of these regions of random mean- 
der results from dynamic flexibility by examining the 
rates of relaxation of the "nitrogens in these regions. 
These rates of relaxation are sensitive to thermal motion 
and can be used to identify segments of polypeptide that 
are dynamically flexible in the actual molecule of pro- 
tein. In the absence of evidence for flexibility, it must be 
assumed that the poor definition of random meander is 
a consequence of an insufficiency of constraints in the 
data. 

In refined nuclear magnetic resonance molecular 
models, it is those regions of the protein that are within 
or sandwiched between «helices or ß structure that are 
the most precisely defined. There are, however, regions 
of the secondary structure that are poorly defined by 
nuclear magnetic resonance. Hydrogens known to be 
within short segments of secondary structure, such as 
Bturns or 3}, helix, participate in so few nuclear 
Overhauser effects that those that are observed are often 
inadequate to define the structure of these segments ”®® 

Molecules of water confined to particular locations 
on the surface of the protein can be incorporated into the 
molecular model on the basis of the nuclear Overhauser 
effects between their 'hydrogens and ‘hydrogens of the 
amino acids by using rotating-frame Overhauser 
enhanced spectroscopy (ROESY).?” Molecules of water, 
however, at locations buried within the structure of the 
molecule of protein have residence times long enough to 
be observed directly by their nuclear Overhauser effects. 


These two types of locations for molecules of water, exte- 
rior and interior, are usually found to occupy the same 
positions in the nuclear magnetic resonance molecular 
model that they do in the crystallographic molecular 
model of the same protein.” The position of metallic 
cations in the molecular model can be established by 
substituting the natural cation with a cation of nuclear 
spin %. For example, the Zn** cations normally bound to 
the amino-terminal domain of regulatory protein GAL4 
from S. cerevisiae (Naa = 62) were replaced with ed" 
cations, and spin-spin couplings between the TC" 
cations and the ß ‘hydrogens on the cysteines that cova- 
lently bind them (6-19) produced a two-dimensional 
correlated spectrum.” 

The fundamental problem with building a molecu- 
lar model from nuclear Overhauser effects is that those 
nuclear Overhauser effects do not define a distance 
between two ‘hydrogens because the relative rates of 
spin diffusion are too dependent on the character of the 
unique surroundings around each nucleus to obtain reli- 
able estimates of distances. The distances in the final 
refined model between pairs of 'hydrogens connected by 
nuclear Overhauser effects are always quite different 
even though almost all of them can be made less than 
0.5 nm.” When distances between ‘hydrogens con- 
nected by nuclear Overhauser effects are measured in a 
crystallographic molecular model of the same protein,*”° 
there is usually little correlation between the actual dis- 
tance between the hydrogens and the strength of the 
nuclear Overhauser effect at the optimal mixing time tmn. 
Those nuclear Overhauser effects observed only after 
extended mixing times do arise from hydrogens that are 
farther apart, but the range of those longer distances is 
broad, and consequently they are not very useful.” If a 
nuclear Overhauser effect is observed between two 
‘hydrogens after an optimal mixing time tm, it can only be 
assumed that they are less than 0.5 nm apart,””?” but 
there are usually notable exceptions even to this 
Limit, 296300-302 

In effect, the existence of a nuclear Overhauser 
effect allows the investigator to connect the two hydro- 
gens in a molecular model of the covalent structure of the 
polypeptide with a rubber band that is elastic enough to 
stretch to a distance equivalent to about 0.5 nm. If 
enough of these rubber bands were inserted into a 
molecular model of the polypeptide and the regions of 
the amino acid sequence identified as structure and 
æ helix (Figure 12-26) had already been locked into these 
secondary structures, the model would snap into a con- 
formation that resembles the native folded conformation 
of the polypeptide. 

It has been possible in many instances to compare 
nuclear magnetic resonance molecular models with 
crystallographic molecular models. It is usually observed 
that the two molecular models resemble each other, 
occasionally quite closely.’ The resemblance is 
strongest in the assignment of ahelices and £ struc- 
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ture.’ The arrangement of these regular secondary 
structures in the tertiary structure, however, often differs 
significantly, but not dramatically, between the two 
molecular models.** °° Some of these differences are 
real and informative.’ They often result from the fact 
that the protein in question is small and flexible or has 
flexible domains and the fact that contacts between the 
molecules of protein in the crystal are able to shift sec- 
ondary structures relative to each other.” If a protein is 
constructed so that in solution it assumes two or more 
conformations because shifts among these conforma- 
tions are required for its function, nuclear magnetic res- 
onance can be used to determine which crystallographic 
molecular models of these various conformations repre- 
sent the species actually present in solution.“ 

In superpositions of nuclear magnetic resonance 
and crystallographic molecular models of the same 
protein, the root mean square deviations between heavy 
atoms (oxygen, nitrogen, and carbon) are usually the 
least within the polypeptide backbone of the regular sec- 
ondary structure in the core, greater for side chains 
buried between these secondary structures in the core, 
and greatest for random meander at the periph- 
ery.°003%:09.310 Tn the comparison of the two molecular 
models for &-amylase inhibitor HOE-467A (n,a = 74), the 
value of the root mean square deviation for the heavy 
atoms of the polypeptide backbone was 0.105 nm; that 
for the heavy atoms of the side chains buried in the core 
was 0.125nm; and that for all heavy atoms was 
0.184 nm.’ This protein, however, is almost entirely 
p structure with little random meander. In the compari- 
son of the two molecular models of human granulocyte 
colony-stimulating factor (Na = 174 aa), a much larger 
protein with significant amounts of random meander, 
the root mean square deviation for the heavy atoms of 
the polypeptide backbone in its four «helices was 
0.286 nm; that for all of the heavy atoms in these 
a helices was 0.333 nm; that for all of the heavy atoms in 
the entire polypeptide backbone was 0.315 nm; and that 
for all heavy atoms was 0.370 nm.” 

It is in the details of the atomic structure that 
nuclear magnetic resonance and crystallographic molec- 
ular models differ most significantly. For example, the 
dispositions of the aromatic rings of Tyrosine 3, Tyrosine 
45, and Phenylalanine 52 were different in two nuclear 
magnetic resonance molecular models of the 
immunoglobulin-binding domain of immunoglobulin G 
binding protein G from Streptococcus (Naa = 56), and 
those dispositions in turn were both different from the 
dispositions in the crystallographic molecular model 7! 
In nuclear magnetic resonance molecular models it is the 
conformations of the side chains that are always more 
uncertain than those of the polypeptide backbone;”” 
but, unfortunately, it is the conformations of the side 
chains that usually accomplish the function of the pro- 
tein. There are, however, some instances in which the 
conformation of a particular side chain in a nuclear mag- 
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netic resonance molecular model is more compatible 
with its known chemical properties than its conforma- 
tion in the crystallographic molecular model of the same 
protein?” and other instances in which the atomic 
details of the crystallographic molecular model could be 
adjusted by nuclear magnetic resonance.*”” 

A nuclear magnetic resonance molecular model is a 
significantly less accurate*” representation of the actual 
structure of a molecule of protein than is a crystallo- 
graphic molecular model for the following reasons. First, 
a crystallographic data set contains significantly more 
information than even the most extensive list of nuclear 
Overhauser effects and coupling constants”? because 
the number of observable nuclear Overhauser effects is 
always less than the number of observable reflections 
and because each reflection comes with an amplitude. 
Second, the heavy reliance on energy minimization in 
building the nuclear magnetic resonance molecular 
model assures that unusual local conformations that are 
excluded by the procedure used for the minimization 
either intentionally or unintentionally will be missed. 
Third, it has been demonstrated by calculation that even 
with an average of 19 nuclear Overhauser effects for each 
amino acid, a value in excess of the number usually avail- 
able, the root mean square deviation of the atoms in a 
nuclear magnetic resonance molecular model from their 
actual positions in the structure of the protein has to be 
at least 0.1 nm.” Even this is an overestimate of the 
accuracy because the paucity of nuclear Overhauser 
effects from random meander was not considered when 
the pairs of hydrogens incorporated into the calculations 
were chosen. Fourth, because the crystallographic data 
constrains the crystallographic molecular model much 
more than the nuclear magnetic resonance data con- 
strains the nuclear magnetic resonance molecular 
model, upon refinement with a combination of both the 
crystallographic data set and the observed nuclear 
Overhauser effects, the crystallographic molecular 
model of a protein quickly converges with only small 
changes in its structure to accommodate both sets of 
data while the nuclear magnetic resonance molecular 
model undergoes far more extensive changes to reach 
accommodation.””° Fifth, the fact that the number of 
nuclear Overhauser effects observed between 'hydro- 
gens in the same amino acid that are inescapably greater 
than 0.5 nm apart in the final nuclear magnetic reso- 
nance molecular model is much greater than the number 
of nuclear Overhauser effects observed between 'hydro- 
gens in different amino acids that are greater than 0.5 nm 
apart in the final nuclear magnetic resonance molecular 
model indicates that in bringing as many ‘hydrogens 
connected by nuclear Overhauser effects as possible to 
within 0.5 nm of each other, the construction of the 
molecular model has produced a structure significantly 
different from the actual structure of the molecule of pro- 
tein.’!* Sixth, it is usually observed that increasing the 
number of constraints in the construction of the nuclear 


magnetic molecular model causes it to become closer to 
the crystallographic molecular model of the same pro- 
tein rather than to assume its own distinct structure.” 

There are, however, informative exceptions to this 
rule and in these instances, nuclear magnetic resonance 
does reveal differences in the structure of a protein 
when it is in solution and when it is in a crystal.°”°°8? 
As with solution scattering of X-rays, these differences 
permit the crystallographic molecular model to be 
adjusted, often minutely,*’’ to a conformation represent- 
ing the molecule of protein in solution, which is the goal 
of all structural studies. 

Perhaps the greatest drawback of nuclear magnetic 
resonance spectroscopy is that it is confined to small pro- 
teins. This confinement is due not only to the problem of 
overlapping cross-peaks on two-dimensional and three- 
dimensional spectra. Because the rate at which a mole- 
cule rotationally diffuses in the solution affects both the 
ratio of signal to noise in a nuclear magnetic resonance 
spectrum and the effectiveness of the pulse sequences 
used for multidimensional and multinuclear spectra, the 
size of a molecule of the protein determines whether or 
not it will even yield a spectrum. Although methods have 
been reported that can increase the rate at which a mole- 
cule of protein rotationally diffuses in a solution,°'° this 
problem has yet to be solved satisfactorily. From 1986 to 
2001, the size of the largest asymmetric units and the size 
of the largest symmetric dimers*’”*"* for which nuclear 
magnetic resonance provided a molecular model 
increased from 120 to 220 aa and from 200 to 450 aa, 
respectively.* The average size of the proteins for which 
molecular models were reported, however, increased 
only modestly during the same period, from about 90 to 
about 125 aa. Unfortunately, most proteins are oligomers 
with asymmetric units larger than 300 aa, sizes that pres- 
ent no difficulty to crystallography. 

Because of their poor definition of the structure of 
random meander and of the conformation of side chains, 
because of their inaccuracy, because of their indistin- 
guishability from crystallographic molecular models, 
and because of their confinement to small proteins, 
nuclear magnetic resonance molecular models have pro- 
vided far less structural information than have crystallo- 
graphic molecular models. Although there are situations 
in which nuclear magnetic resonance can be applied 
when crystallography cannot, for example in defining the 
details of the conformational change that occurs upon 
the binding of porcine phospholipase A, to micelles of 
dodecyl phosphocholine,” and situations in which 
nuclear magnetic resonance establishes clear and signif- 
icant differences between a crystallographic molecular 
model and the structure of a protein in solution, for 
example in showing that the central unsupported o helix 


* A preliminary nuclear magnetic molecular model of malate syn- 
thase from E. coli, which is a monomer of 723 amino acids, has 
been reported.” 


in the crystallographic molecular model of calmodulin 
does not form in solution." the dramatic advantages 
that nuclear magnetic resonance has over crystallogra- 
phy are that it readily observes hydrogens and it observes 
proteins while they are in solution. Both of these advan- 
tages are exploited when nuclear magnetic resonance is 
used to monitor acid-base titrations of individual side 
chains and when it is used to monitor the exchange of 
specific amido protons or deuterons in the polypeptide 
backbone with deuterons or protons, respectively, of the 
water in which the protein has been dissolved. 

The first successful application of nuclear magnetic 
resonance spectroscopy in the study of proteins was the 
determination of the acid dissociation constants of their 
histidines. The z electrons of an aromatic ring (2-24) are 
induced to circulate in a ring current by an applied mag- 
netic field. This ring current creates a toroidal magnetic 
field opposite in direction to the applied field in the 
center of the ring but reinforcing the applied field at the 
periphery of the ring. This additional local magnetic field 
at the periphery causes all of the ‘hydrogens around aro- 
matic rings to absorb at higher frequencies and hence 
greater chemical shift (Equations 12-53 and 12-54). The 
nitrogens on either side of carbon 2 of the imidazole of a 


histidine 


HN N 


E? 


H 
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are electronegative elements that withdraw electrons 
from carbon 2, decreasing the shielding provided by the 
c electrons in the carbon-hydrogen bond and shifting 
the absorption of a ‘hydrogen on carbon 2 further down- 
field and away from the absorptions of the other aro- 
matic ‘hydrogens on phenylalanines, tyrosines, and 
tryptophans. The absorption of the 'hydrogen on carbon 
2 of the imidazole of a histidine is not divided by 
spin-spin coupling when the adjacent protons on the 
two nitrogens have been exchanged with deuterons from 
the H]H;O in which the protein is dissolved. For all of 
these reasons, the absorption from this ‘hydrogen on 
each histidine in a protein dissolved in [*H]H,O appears 
as a sharp individual peak in the nuclear magnetic reso- 
nance spectrum. One of the first nuclear magnetic spec- 
tra displaying these absorptions in a native protein was 
the spectrum for ribonuclease (Figure 12-28).°”! 
Improvements have been made since these early experi- 
ments that sharpen the peaks of absorbance and elimi- 
nate peaks in this region of the spectrum from 
unexchanged amido ‘hydrogens. 

As the imidazole of histidine gains a proton during 
its acid-base reaction, the nitrogens of the conjugate 
acid become even more electron-withdrawing than 
those of the conjugate base, and the absorption of the 
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Figure 12-28: Low-resolution (100 MHz) nuclear magnetic reso- 
nance spectra covering the region of the unresolved absorptions of 
the ‘hydrogens of the six tyrosines and three phenylalanines (aro- 
matic) and the four resolved absorptions of the 'hydrogens on the 
carbons 2 of the four histidines (numbered 1-4) of ribonucle- 
ase A) Absorption is presented as a function of chemical shift (in 
hertz from the peak of absorption of an internal standard). As the 
excitation frequency is 100 MHz, 100 Hz of chemical shift is 1 ppm. 
The sample was a 12 mM solution of ribonuclease A (naa = 124) in 
deuterioacetate buffers in [2H]H,O at various values of pH (noted 
on the drawing). Peak 5 is a proton on carbon 4 of one of the his- 
tidines. Reprinted with permission from ref 321. Copyright 1967 
retained by authors. 


‘hydrogen on carbon 2 assumes an even larger chemical 
shift (notice the movement of the peaks in Figure 12-28 
to higher chemical shift as the pH decreases). Because a 
specific fraction of the imidazoles in a population of 
identical histidines is the cationic conjugate acid at a 
particular pH and because the transfer of protons among 
the individuals in that population is much faster than the 
time resolution of nuclear magnetic resonance spec- 
troscopy, the absorption of the population of 'hydrogens 
on carbon 2 of a given population of histidines in a pro- 
tein assumes a chemical shift, dy,.,, that is the weighted 
mean between that of the neutral conjugate base 64,, and 
that of the cationic conjugate acid Au pa: 

Hobs = fa Sua + fua Onna (12-57) 
where LG and ku are the fractions of conjugate base and 
conjugate acid, respectively, at a particular pH. 
Therefore, the chemical shift as a function of pH traces 
the titration curve of a particular histidine in a molecule 
of protein. The first application of this method was the 
measurement of the titration curves for the four his- 
tidines of ribonuclease.*”! 

In a number of proteins, such as ribonuclease, 
myoglobin,” subtilisin,” and carbonate dehy- 
dratase 27727 the individual peaks of absorption from 
Ihydrogens on the carbons 2, and hence the titrations of 


323,324 
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their imidazoles, could be assigned to specific histidines 
in the sequence of each protein. These assignments are 
now usually made by mutating each of the histidines 
consecutively to another amino acid and observing 
which of the absorptions disappears in each 
mutant.”-°”> The acid dissociation constants for particu- 
lar histidines in native proteins have been used to test 
computational methods” for assessing the effect of 
electric field and relative permittivity on the acid dissoci- 
ation constant of a particular histidine in a crystallo- 
graphic molecular model. 

The absorptions of the ‘hydrogens on the carbons 2 
of the histidines in a protein can be observed in its one- 
dimensional nuclear magnetic resonance spectrum. 
Although the absorptions of the individual ‘hydrogens 
on the indole nitrogens of tryptophan*”' and the individ- 
ual ‘hydrogens on the methyl groups of threonines*” can 
also often be observed in a one-dimensional spectrum, 
these ‘hydrogens do not register acid dissociations. 

The absorptions from nuclei in the side chains of a 
particular type of amino acid in a protein, however, can 
also be isolated on a one-dimensional spectrum by 
expressing that protein in bacteria that are grown on 
carbon or nitrogen versions of that particular type of 
amino acid. When this is done, a bacterium auxotrophic 
for that amino acid is chosen for the expression. For exam- 
ple, endo-1,4-ß-xylanase from Bacillus circulans could be 
expressed in a strain of E. coli that was missing the three 
transaminases normally capable of producing gluta- 
mate.”“° In such cells, only the [6-""C]glutamate added to 
the growth medium is incorporated into the expressed 
protein.** When the xylanase was purified from this 
expression system, a peak of absorption from the carboxyl 
carbon of each of its two glutamates could be readily 
observed in a one-dimensional C nuclear magnetic res- 
onance spectrum, and the values of the chemical shifts for 
these two peaks as a function of the pH of the solution 
traced the acid-base titrations of their side chains (Figure 
12-29).** Using enrichments such as the one in this exam- 
ple permits acid-base titrations to be performed on fairly 
large proteins (the xylanase is a monomer of 185 aa) with- 
out the necessity of making a full set of assignments. 

If the cross-peaks in two- and three-dimensional 
spectra and the chemical shifts of all of the nuclei in a 
protein have been assigned, it is possible to follow in 
those two- or three-dimensional spectra the chemical 
shift of a particular nucleus in a particular side chain as 
it changes with pH to produce a titration curve for an 
acid-base in that side chain. For example, the chemical 
shifts of the ß'hydrogens on the aspartates and the 
y'hydrogens on the glutamates in murine epidermal 
growth factor, which could be monitored with two- 
dimensional (‘H-'H) TOCSY spectra (Figure 12-19), 
decrease in magnitude as the respective adjacent car- 
boxyl groups lose their protons as the pH is increased.” 
The chemical shifts of the nuclei in ribonuclease H from 
E. coli (n,a = 155) that had been enriched in carbon and 
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Figure 12-29: Acid-base titration of Glutamate 78 and Glutamate 
172 in endo-1,4-ß-xylanase (naa = 185) from B. circulans.*™ A strain 
of E coli, DL39, which is deficient in amino acid transaminases, was 
used to express the xylanase. It was grown on a medium containing 
[6-'3C] glutamate as the sole source of this amino acid so that all of 
the glutamates in the resulting protein were enriched in "carbon at 
the acyl position in their side chains. A series of one-dimensional 
carbon nuclear magnetic resonance spectra were gathered at var- 
ious values of pH. Every other value of the pH is noted on the verti- 
cal axis. The values for the pH of unlabeled traces are intermediate 
between those for the labeled traces. The region containing the 
peaks of absorption from the acyl carbons of glutamate and gluta- 
mine of the resulting spectra are presented, one above the other. 
The peaks assigned by site-directed mutation to Glutamate 78 and 
Glutamate 172, the only two glutamates in the protein, are high- 
lighted. Peaks arising from the acyl carbons of several glutamines 
are labeled individually. Their chemical shifts respond to acid-base 
titrations of side chains in their vicinity. Reprinted with permission 
from ref 334. Copyright 2000 Elsevier B.V. 


‘Snitrogen were assigned, and the acyl carbons of indi- 
vidual aspartates and glutamates could be distinguished 
on a two-dimensional ['*C-'H] HSQC/HSQC correlated 
spectrum (Figure 12-30A).*°° When the chemical shifts of 
the acyl “carbons of the aspartyl side chains were plot- 
ted as a function of pH, titration curves for each aspartate 
were obtained (Figure 12-30B). Titration curves for the 
same aspartates could also be obtained by following the 
chemical shifts of the two ‘hydrogens on the ß carbons 


on each aspartate (chemical shifts on the abscissa of 
Figure 12-30A). Aspartates 94, 108, and 134 had values 
of pK, expected for carboxylates exposed on the surface 
of a protein (3.2, 3.2, and 4.1, respectively). Aspartates 
102 and 194 had values of pK, less than 2, which suggests 
that in the native structure of the protein they are in elec- 
tropositive surroundings. 

Aspartate 10 and Aspartate 70 are immediately 
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adjacent to each other in the crystallographic molecular 
model of ribonuclease H?” , and their acid-base titrations 
are coupled tautomerically to each other (Figure 
12-30B). The two coupled titration curves should be 
complex functions of the microscopic acid dissociation 
constants (Equation 2-24) and of the four values of the 
chemical shift of Aspartate 10 and the four values of the 
chemical shift for Aspartate 70 (the ones when Aspartate 
10 and Aspartate 70 are both protonated, the ones when 
Aspartate 10 is protonated and Aspartate 70 is not, the 
ones when Aspartate 10 is not protonated and Aspartate 
70 is, and the one when both are unprotonated).”” To 
obtain exact values for these 12 parameters, additional 
experiments in which each of the aspartates in turn is 
mutated to an asparagine would have to be performed. 
If, however, it is assumed that neither the chemical shift 
of protonated Aspartate 10 nor the chemical shift of 
unprotonated Aspartate 10 is perturbed by the ionization 
of Aspartate 70 and that neither the chemical shift of pro- 
tonated Aspartate 70 nor the chemical shift of unproto- 
nated Aspartate 70 is perturbed by the ionization of 
Aspartate 10, then the two respective titration curves reg- 
ister only the fraction of each aspartate that is ionized at 
a particular pH. If this is the case, as it seems to be, then 
the equilibrium constant between the concentrations of 
the two tautomers, the one in which Aspartate 10 is pro- 
tonated and Aspartate 70 is unprotonated and the one in 
which Aspartate 10 is unprotonated and Aspartate 70 is 
protonated, is 3 and the two values for the macroscopic 
pK, (Figure 2-7) are pK, = 2.9 and pK, = 6.4.°°° It follows 
that the microscopic pK, for Aspartate 10 when Aspartate 
70 is protonated is 3.5, that for Aspartate 70 when 
Aspartate 10 is protonated is 3.0, that for Aspartate 10 
when Aspartate 70 is unprotonated is 6.3, and that for 
Aspartate 70 when Aspartate 10 is unprotonated is 5.8. 
The titration curves of the two glutamates of the 
xylanase from B. circulans (Figure 12-29), which are also 
immediately adjacent to each other in the crystallo- 
graphic molecular model, cannot be directly register- 
ing the macroscopic acid dissociations and the ratio of 


Figure 12-30: Acid-base titration of aspartates in isoenzyme I of 
ribonuclease H from E coli.® (A) Two-dimensional (C-H) 
HSQC/HSQC nuclear magnetic resonance spectrum of a 0.15 mM 
solution of the protein in [7H]H,O at pH 5.5. The region of the two- 
dimensional spectrum displayed contains cross-peaks from the acyl 
carbons of the side chains of glutamate, glutamine, aspartate, and 
asparagine and the ‘hydrogens on the adjacent methylene carbon. 
Each of the side chains produces a pair of cross-peaks because there 
are two ‘hydrogens on each adjacent methylene carbon, and each of 
these 'hydrogens is positioned by the tertiary structure of the protein 
in a chemically different environment. The pairs of cross-peaks from 
each side chain are labeled with the position in the sequence of the 
amino acid producing them. (B) Titrations of individual aspartates in 
the protein. A series of two-dimensional spectra were gathered at dif- 
ferent values of pH. The chemical shift of the acyl “carbon of each 
side chain (ordinate of the pairs of cross-peaks in panel A) is plotted 
as a function of pH. Reprinted with permission from ref 336. 
Copyright 1994 American Chemical Society. 
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the two tautomers, because in this example, the peak of 
absorption from the acyl “carbon of Glutamate 78 drifts 
to a smaller value of the chemical shift rather than to a 
larger one upon the titration of Glutamate 172, and the 
peak of absorption from Glutamate 172 also drifts to a 
smaller value of the chemical shift rather than to a larger 
one upon the titration of Glutamate 78. 

The carboxylates on the side chains of glutamates and 
aspartates sometimes form hydrogen bonds with amido 
nitrogen-hydrogens from the polypeptide backbone. 
These can be identified in nuclear magnetic resonance 
spectra because the chemical shift of the amido ‘hydrogen 
participating in the hydrogen bond will increase in 
magnitude in concert with the decrease in the chemical 
shift of the ‘hydrogens adjacent to the carboxyl group as it 
loses its proton, becomes a carboxylate, and forms the 
hydrogen bond with the amido nitrogen-hydrogen.”” 
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Problem 12-13: This figure is a two-dimensional 
nuclear magnetic resonance spectrum of the 
cytochrome c-551 from Pseudomonas aeruginosa." 
Reprinted with permission from ref 340. Copyright 1990 
American Chemical Society. 
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In the spectrum, the off-diagonal peaks arise from 
nuclear Overhauser enhancements (tn = 150 ms). The 
symmetrically displayed peaks on the two sides across 
the diagonal result from the same nuclear Overhauser 
effects. You should convince yourself that the patterns 
are symmetric across the diagonal. 


(A) What hydrogens in a protein produce off-diagonal 
peaks in this region of the spectrum? 


(B) The peaks connected by the horizontal and verti- 
cal lines have been identified with a particular 
subset of these hydrogens. Each of the peaks con- 
nected by these lines is produced by a nuclear 
Overhauser enhancement between two hydro- 
gens. Draw a polypeptide in the extended confor- 
mation as in 2-15. On your drawing show with 
double arrows only the connections that give rise 
to those peaks that are highlighted by the hori- 
zontal and vertical lines. 


(C) What do the horizontal and vertical lines indicate 
about these peaks? Why are the peaks labeled 
with pairs of numbers that increase consecu- 
tively? 

(D) What other information was used to assign the 
numbers to the particular peaks? 


Problem 12-14: The figure is a portion of the two-dimen- 
sional nuclear Overhauser enhanced spectrum?“ of the 
lipoyl domain from the pyruvate dehydrogenase complex 
of B. stearothermophilus. Reprinted with permission from 
ref 341. Copyright 1991 Blackwell Publishing. 
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(A) What are the two functional groups in the 
polypeptide on which the hydrogens are located 
that produce the peaks in the spectrum? 


(B) Draw the polypeptide from Cysteine 36 to 
Alanine 43, and draw double-headed arrows to 
indicate the connections between the pairs of 
hydrogens represented by the horizontal and 
vertical lines between the different peaks in the 
spectrum. 


(C) The squares in the spectrum represent the posi- 
tions of peaks in another type of two-dimensional 
spectrum of the protein. What is this other type of 
spectrum, and what type of connections does it 
record? 


(D) Some of the squares coincide with peaks in the 
nuclear Overhauser enhanced spectrum and 
some do not. Why are there peaks in these posi- 
tions in the other spectrum but not in the nuclear 
Overhauser enhanced spectrum? 


Problem 12-15: Immunity protein Im9 is a folded 
polypeptide of 86 amino acids. It is responsible for 
inhibiting the action of colicin E9, an extracellular 
antibiotic protein produced by E. coli. Immunity pro- 
tein Im9 uniformly labeled with nitrogen was obtained 
by growing E. coli JM105 cells expressing high levels of 
the protein on minimal medium made with (!SN]NH,Cl. 
The protein was purified, and the chemical shifts of 
most of the 'hydrogens and the "nitrogens in the pro- 
tein were assigned by the usual procedures. After the 
chemical shifts had been assigned, a three-dimensional 
spectrum was taken in which the three dimensions 
were the chemical shifts of ‘hydrogen, 'hydrogen, and 
nitrogen.” The two hydrogen dimensions display 
nuclear Overhauser connections between pairs of 
hydrogens. Because the chemical shifts of each of the 
nitrogens in the polypeptide had been assigned, it was 
possible to select sections through the three-dimen- 
sional spectrum, each fixed at the chemical shift of a 
particular backbone nitrogen in the dimension of the 
chemical shifts of the nitrogens. Narrow strips from 
the resulting successive two-dimensional 'H-'H nuclear 
Overhauser enhanced spectra are displayed in the 
figure. Each strip is centered on the horizontal axis at 
the chemical shift of the hydrogen on the amido nitro- 
gen the chemical shift of which is fixed. The value of the 
chemical shift of each amido ‘nitrogen at which the 
section was fixed and the identity of that amido nitro- 
gen are indicated on each strip. Reprinted with permis- 
sion from ref 342. Copyright 1994 American Chemical 
Society. 
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(A) Peaks in the figure are labeled with pairs of num- 
bers. Draw the polypeptide between Glutamate 
30 and Valine 37 as the polypeptide is drawn in 
2-15. Draw, however, the actual side chains along 
the polypeptide to identify each amino acid. Draw 
double-headed arrows labeled with the same 
pairs of numbers as the peaks in the strips are 
labeled and connecting every pair of hydrogens in 
your drawings that produce a labeled peak in the 
strips for Glutamate 30 to Valine 37. 


(B) What are the peaks in the spectra that are labeled 
with only a single number? 


(C) What are the peaks in the spectra that are not in 
the center of their strip? 


Human interleukin-4 is a member of the family of 
hematopoietic cytokines that modulate cell proliferation 
and differentiation within the immune system. It is a 
folded polypeptide of 130 amino acids. A gene encoding 
human interleukin-4 was inserted into a pTR550 plasmid 
so that the protein could be expressed in E. coli. The 
human interleukin-4 expressed at high levels by these 
cells was uniformly labeled with “carbon by growing 
them on minimal medium made with ['’C]glycerol (99 
atom%). The protein was purified and the chemical shifts 
of most of the 'hydrogens and the “carbons in the pro- 
tein were assigned by the usual procedures. After the 
chemical shifts had been assigned, a three-dimensional 
spectrum was taken in which the three dimensions were 
the chemical shifts of ‘hydrogen, ‘hydrogen, and 
carbon.” The two hydrogen dimensions display 
nuclear Overhauser connections between pairs of hydro- 
gens. Because the chemical shifts of each of the carbons 
in the polypeptide had been assigned, it was possible to 
choose sections through the three-dimensional spec- 
trum each fixed at the chemical shift of a particular 
a carbon in the protein in the dimension of the chemical 
shifts of the carbons. Strips from the resulting succes- 
sive two-dimensional 'H-'H nuclear Overhauser 
enhanced spectra are displayed in the figure. Reprinted 
with permission from ref 273. Copyright 1994 Elsevier 
B.V. 
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Each strip is centered on the horizontal axis at the chem- 
ical shift of the hydrogen on the a carbon the chemical 
shift of which is fixed. The value of the chemical shift of 
each o “carbon at which the section was fixed and the 
identity of that o carbon are indicated on each strip. 


(D) Peaks in the figure are labeled with the position of 
a hydrogen in an amino acid. The sequence of 
human interleukin-4 between Lysine 42 and 
Valine 51 is KETFCRAATV. Draw the polypeptide 
in this segment as the one is drawn in 2-15. Draw, 
however, the actual side chains along the 
polypeptide to identify each amino acid. Draw 
double-headed arrows labeled with the same 
numbers and Greek letters as the four peaks in the 
strips are labeled and connecting the pairs of 
hydrogens that produce the four labeled peaks in 
the strips for Phenylalanine 45, Alanine 48, and 
Valine 51. Peaks in adjacent strips are connected 
with horizontal lines. Draw double-headed 
arrows indicating the connections that are repre- 
sented by each of the four horizontal lines con- 
necting pairs of peaks from two different strips. 


(E) Into what type of secondary structure is the 
polypeptide between Isoleucine 5 and Isoleucine 
11 and between Lysine 42 and Valine 51 folded in 
human interleukin-4? 


Exchange of Protons 


An acidic proton at any position in the covalent structure 
of a protein is subject to exchange with protons in the 
solution. Protons on the side chains of exposed polar 
amino acids such as asparagines, glutamines, aspartic 
acids, glutamic acids, serines, threonines, cysteines, 
arginines, lysines, and histidines usually exchange with 
protons in the solution so rapidly that the rates of their 
individual exchanges cannot be measured. The rate of 
exchange of a proton on the indole nitrogen of a trypto- 
phan, when that proton is sterically hindered from 
exchanging because the side chain is buried, is, however, 
often slow enough to be measured.” For example, 
when the constant fragment of human Bence-Jones pro- 
tein Nag is dissolved in [(*H]H,0, the proton on the indole 
nitrogen of Tryptophan 150, which is buried in its inte- 
rior (Figure 6-39), exchanges with deuterons in the sol- 
vent at 1.2x 10° s’ at p?H 7.1 and 25 °C,** which is equal 
to the rate constant for the global unfolding of the pro- 
tein at this pH and temperature. It was concluded that 
only upon the complete, transient unfolding of the pro- 
tein does the proton on the indole nitrogen become suf- 
ficiently exposed to the solvent to exchange. A similar 
study of the rates of exchange of indole protons on the 
tryptophans in lysozyme from G. gallus, however, 
demonstrated that they exchanged more rapidly than the 
protein unfolds, presumably during local fluctuations in 
its structure that cause them to become exposed in turn 
to the aqueous phase.” 

Almost all of the protons that are so sterically hin- 
dered from exchange by the structure of a molecule of 
protein that their rates of exchange are slow enough to be 
measured conveniently are amido protons in its peptide 
bonds. The rates of exchange of these amido protons can 
be monitored by following the rate at which protons are 
replaced by deuterons when the unmodified protein is 
transferred to a solution prepared with (*H]H,O or the 
rate at which deuterons or tritons are replaced by pro- 
tons when the protein that has been equilibrated in solu- 
tions made with [H]H;O or [H]H,O, respectively, is 
transferred to a solution prepared with ['H]H,O. The 
exchange of a proton is registered when an acidic proton, 
deuteron, or triton dissociates from a lone pair of elec- 
trons on the protein and a deuteron, proton, or proton, 
respectively, then associates with that same lone pair of 
electrons. Because the concentrations of water, protons, 
and hydroxide ions are all constant at any particular pH, 
the exchange of a proton is a pseudo-first-order process 
with a pseudo-first-order rate constant. 

The exchange of an amido proton for a deuteron 
when an undeuterated peptide is dissolved in [?H]H,O 
can be followed by observing the decrease in the nuclear 
magnetic absorptions of its amido protons as a function 
of time.* The observed pseudo-first-order rate con- 
stants, ke» display both specific acid and specific base 
catalysis (Figure 12-3 me"? 


0.2 


0.1 


0.04 
CHaCNHCHCNHCHs 
0.02 CH3 
a 
£ Pia) Nu 
= CH3CNHCHCNHCH3 
$ CH3OH 
$ 
N 
0.2 A fad 
0.1 
0.04 
0.02 


Figure 12-31: Magnitude of the observed first-order rate con- 
stants for the exchange of the amido proton on the amino-terminal 
side (solid symbols) or carboxy-terminal side (open symbols) in the 
N*-acetyl-N-methyl amides of alanine (@, ©), serine (A, A), and 
asparagine (©).”® The structure of each compound is drawn and 
the protons that exchange are in boldface type. Solutions of each 
model compound were prepared in [?H]H,O at the noted p’H and 
immediately introduced into a nuclear magnetic resonance spec- 
trometer. The two amido protons in each compound produced the 
usual splitting into a doublet of the absorption of the spin-spin- 
coupled 'hydrogen or ‘hydrogens on the respective, immediately 
adjacent carbons. As the proton on each nitrogen exchanged for a 
deuteron, the respective doublet was converted into a singlet. The 
areas of doublet and singlet were measured as a function of time, 
and it was observed that the doublet was converted to the singlet in 
a first-order process. The rate of this process was converted to an 
observed first-order rate constant, kops (minute), and its value is 
plotted logarithmically as a function of p’H. The curves are fits of 
Equation 12-58 to the data. Reprinted with permission from ref 
345. Copyright 1972 American Chemical Society. 
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k = kp [D 


u | 2.%p-lOD] (12-58) 
where D stands for deuterium. At the minimum rate 
(Figure 12-31), acid catalysis and base catalysis are of 
equal magnitude and the domination of one over the 


other inverts. The catalyticmechanisms are known to be 
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respectively.” In both cases, the removal of the 
proton is the rate-limiting step in the reaction. The 
second-order rate constants, kp+ and kop- have been 
tabulated for the amido protons on either side of specific 
side chains of amino acids in model compounds*” and 
small peptides.”” The second-order rate constants for 
acid catalysis, kps, vary between 6 and 5000 M” min" 
and those for base catalysis, kop-, vary between 2 x 10° 
and 1x 10'' M7 min”. 

When a protein such as myoglobin is incubated in 
tritiated water for an extended period of time at moder- 
ate temperature (37 °C), most ofits amido protons reach 
equilibrium with the protons and tritons in the water. 
The tritiated water can then be replaced with untritiated 
water by molecular exclusion chromatography. During 
the chromatography all of the tritons on exposed polar 
side chains exchange with protons. When the tritium 
remaining on the protein is measured by liquid scintilla- 
tion counting as a function of time at low temperature 
(0 °C), a population of tritiated amides that lose their tri- 
tons very slowly can be distinguished (Figure 12-32). 
The rates at which these amides exchange are far slower 
than the rates observed for small peptides in solution.*’ 

The amido hydrons on peptide bonds in a protein 
that exchange slowly with other hydron isotopes in the 
solvent are the hydrons that participate in stable hydro- 
gen bonds in the folded polypeptide.” That the number 
of slowly exchanging tritons in myoglobin (Figure 12-32) 
is about equal to the number of amido protons partici- 
pating in buried hydrogen bonds in the crystallographic 
molecular model (approximately 120) is consistent with 
this conclusion.*” In the case of lysozyme, the exchange 
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Figure 12-32: Exchange of tritons from myoglobin equilibrated 
with PH]H,O and then transferred to [H]H,0 at pH 5, 0 348 
Myoglobin from D catodon (nz, = 153) was incubated at 37 °C and 
pH 9 with PH]H,O until equilibrium was reached (20 h) at all of its 
amides. The solution was cooled to 0 °C, and the protein was rap- 
idly transferred by molecular exclusion chromatography to 
[ŻH]H;O. Over these intervals of preequilibrium, only protons on 
amides and more acidic acid-bases on the protein should 
exchange with tritons, and only the amides should retain any tri- 
tium through the chromatography. The amount of tritium associ- 
ated with the protein [moles of tritium (mole of protein) ] was 
followed as a function of time (hours). Myoglobin contains 162 
amido protons, and this number agrees favorably with the 155 tri- 
tons found on the protein at the earliest time after the chromatog- 
raphy. Reprinted with permission from ref 348. Copyright 1969 
Academic Press. 


of protons for deuterons at the amides of the polypeptide 
could be followed directly by observing the decrease in 
the absorbance of the amide II vibration in the infrared 
spectrum relative to the absorbance of the amide I vibra- 
tion. The agreement between the number of slowly 
exchanging amido protons (44 moles for every mole of 
lysozyme) and the number of buried hydrogen bonds 
involving the amido protons of the polypeptide in the 
crystallographic molecular model is also quite close.” 
It is also possible to follow such global exchange of 
the protons in a protein by mass spectrometry. The pro- 
tein is diluted into [*H]H,O, and samples are removed at 
successive times and submitted to electrospray mass 
spectrometry to determine the number of deuterons that 
have been incorporated by exchange during each inter- 
val.*°'*°3 The samples are usually submitted to liquid 
chromatography in ['H]H,O immediately prior to mass 
spectrometry to remove the [?H]H;O. This chromatogra- 
phy is performed at pH 3 and 0 °C to prevent exchange of 
the amido deuterons with protons but to permit 
exchange at carboxylic acids, amines, hydroxyls, thiols, 
and imidazoles, because the intention of such measure- 
ments is to monitor the global exchange of amido pro- 
tons in peptide bonds. In some instances different 
populations of protons that exchange at different rates 
can be distinguished. In the case of fructose-bisphos- 
phate aldolase from rabbit, these populations were 


thought to represent protons from its different structural 
domains.” 

A limitation of such global measurements of the 
exchange of amido protons is that the identity of those 
amides in the polypeptide that display slow exchange is not 
established. One way to increase the resolution is to digest 
quickly samples of the protein removed at different inter- 
vals over the time during which exchange is permitted to 
occur, separate the resulting peptides in each sample chro- 
matographically, and use a mass spectrometer to assess 
the extent of incorporation of deuterium into each of the 
peptides.” The exchange of protons is quenched at the 
end of each interval in [*H]H,O by dropping the pH to 3 
and the temperature to 0 °C to minimize further exchange. 
The protein, unfolded by the low pH, is digested with 
pepsin A, an endopeptidase that functions best at low pH; 
and the resulting peptides are separated by chromatogra- 
phyin ('H]H,0 atlow pH and low temperature. Each peptic 
peptide is then identified by its mass and its pattern of frag- 
mentation (Figure 3-8). In this way the amount of incor- 
poration of deuterium into the amides of the peptide bonds 
within a particular segment of the folded polypeptide in 
the native protein, namely, that segment ending up in the 
peptic peptide, can be monitored.” Still, the protons at 
the individual amides within that segment cannot be dis- 
tinguished one from the other. 

The advantage of such endopeptidolytic analyses 
is that they can be applied to large proteins**** such as 
rabbit fructose-bisphosphate aldolase (n,, = 4 x 363), the 
a-catalytic subunit of cyclic AMP-dependent protein 
kinase (naa = 350), human dual specificity mitogen-acti- 
vated protein kinase kinase 1 (n,, = 393), and dihy- 
drodipicolinate reductase from E coli (n,. = 4 x 273). 
Such large proteins cannot be analyzed by nuclear mag- 
netic resonance. If, however, the protein is small enough, 
nuclear magnetic resonance spectroscopy can provide 
rates of exchange of most of the resolved amido protons 
along the polypeptide backbone in the native structure. 

The cross-peaks in the fingerprint region of a two- 
dimensional ('H-'H) nuclear magnetic resonance cor- 
related spectrum (Figure 12-33)*”” or a two-dimensional 
(°N-'H) HSQC correlation spectrum®” arise from 
spin-spin coupling between an amido proton and its 
adjacent a ‘hydrogen or amido ‘nitrogen, respectively. 
When the protein is transferred from [HH] H,O to FH] HO, 
each of these cross-peaks decreases in intensity as the 
amido proton exchanges for a deuteron (Figure 
12-33).°°? The rate of exchange of each proton is equal 
to the rate at which its cross-peak decreases in intensity. 

It is generally assumed that a proton on a particular 
amide in the polypeptide backbone of a protein can 
exchange with a deuteron in the (*H]H,O surrounding 
the protein only when that proton is exposed to the solu- 
tion. An unexposed position in the polypeptide back- 
bone becomes exposed as the result of a conformational 
change in the protein.*°'" Although the details of such 
conformational changes are unknown, they must differ 
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Figure 12-33: Exchange of amido protons in the polypeptide backbone of 
bovine basic trypsin inhibitor for deuterons in the solution.” The protein was 
dissolved at a concentration of 0.02 M in PH]H,O anda pH of 3.5 and brought 
to 36 °C. After the noted times, samples were removed, the temperature was 
lowered to 25 °C, and a two-dimensional (‘H-'H) correlated spectrum was 
recorded. The fingerprint region containing cross-peaks resulting from 
spin-spin coupling between amido protons and a ‘hydrogens (Figure 12-16) is 
presented. As amido protons in a given population exchange for deuterons, its 
cross-peak decreases in intensity. Each cross-peak is labeled in the last spec- 
trum in which it can be observed with the position of its @'hydrogen in the 
sequence of the protein. The cross-peaks from those amido protons remaining 
after 39,840 min are labeled with their positions in the sequence. Twenty of the 
original 32 cross-peaks disappear as a result of exchange of amido protons for 
deuterons in the solvent, and most of the cross-peaks remaining display suffi- 
cient decreases in intensity over this interval to give rate constants for the 
exchange of their amido protons. Reprinted with permission from ref 359. 
Copyright 1982 Academic Press. 


from one location in the protein to another because pro- 
tons exchange with a wide range of rate constants. 
Conformational changes that expose protected protons 
to the solution may be relatively rapid movements of 
loops on the surface, motions opening crevices in the 
surface of the protein more widely, fraying motions in 
a helices that unwind them from their ends,*°"™ unzip- 
ping motions in £ structure,” or the complete, tran- 
sient unfolding of individual elements of secondary 
structure, domains,” or even the entire molecule of 
the protein 2727" Almost all of the amido protons in 
the polypeptide backbone that exchange slowly enough 
that their rates can be measured are involved in hydro- 
gen bonds in the native structure of the protein. Those 
hydrogen bonds must be broken during the conforma- 
tional change to permit the unretarded’® exchange of 
the protons to occur.*”” 

Associated with whatever conformational change is 
responsible for the exchange of a particular proton in a 
protein is a rate of opening kop and a rate of closing ka. 
The kinetic mechanism for the process Le 


kop kex 
unexchangeable =——= exchangeable ——> exchanged 
el (12-61) 


where unexchangeable is the proton in a location where 
exchange is sterically prohibited, exchangeable is the 
proton in a location where it is exposed to the solution 
and exchange is unobstructed, exchanged is the site at 
which exchange has occurred and a deuteron is occupy- 
ing the location that was occupied by the original proton, 
and k,, is the pseudo-first-order rate constant for the 
exchange of the original proton by the deuteron. 

There are two limits to the rate equation for this 
mechanism. If kex >> ką then 


=k (12-62) 


where kops is the observed rate constant of exchange. This 
condition is the EX, limit. If ka >> kex then 
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where Kontis the equilibrium constant for the conforma- 
tional change producing the exchangeable conforma- 
tion, [exchangeable] / [unexchangeable]. This condition 
is the EX, limit. 

It is customary to present the rate of exchange of a 
particular amido proton i in the polypeptide backbone 
of a protein in terms of its protection factor” Kex,i! Kobs,i- 
The rate constant kex; is the reference rate constant for 
the exchange of that proton in the fully exposed, 
unfolded polypeptide, and Eer is the observed rate 
constant under a given set of conditions. The rate con- 
stant for exchange in the fully exposed conformation, 
kex is assumed to be equal to the measured rate con- 
stant for exchange of an equivalent amido proton in a 
derivative of the amino acid (Figure 12-31)** or short 
peptide” as a function of pH. It has been shown by 
measuring the rates of exchange of particular amido 
protons in an unfolded polypeptide that such estimates 
are reliable.” 

A protection factor for a particular proton is mean- 
ingful only when the exchange of the proton occurs at the 
EX, limit, where the protection factor is equal to the 
inverse of the equilibrium constant for the conforma- 
tional change, Kar (Equation 12-63). When exchange 
occurs at the EX; limit, the observed rate constant is 
equal to kop,» the rate constant for the opening of the 
structure that protects protoni (Equation 12-62). 
Dividing the observed rate constant by kex; would be 
meaningless. Consequently, it is necessary to demon- 
strate that the exchange process is at the EX; limit for the 
protection factor to be meaningful. Presentations of pro- 
tection factors in the absence of such a demonstration?” 
are equivocal. 

It is possible to make a distinction between an EX, 
limit and an EX, limit for exchange by evaluating 
whether the exchange at adjacent amido positions is 
concerted or not.’ When exchange of a proton at a par- 
ticular location occurs at the EX, limit, the open confor- 
mation persists long enough (ke >> ka) that all of the 
protons at adjacent positions are exchanged before the 
protein snaps shut. Consequently, only two populations 
of protons at that location exist, one in which the proton 
and its neighbors have not yet exchanged and one in 
which both the proton and its neighbors have 
exchanged. When exchange of a proton at a particular 
location occurs at the EX; limit, the open conformation 
has such a short lifetime (kex << ka) that at most only one 
proton in a neighborhood can exchange at each opening. 
Consequently, in addition to the two populations already 
described, there are populations in which the proton has 
exchanged but its neighbors have not and populations in 
which the proton has not exchanged but its neighbors 
have. All of these populations can be distinguished by the 


ratios of the intensities of cross-peaks arising from 
nuclear Overhauser effects. In the cases of both bovine 
basic pancreatic trypsin inhibitor (n,a = 58) and a-amy- 
lase inhibitor HOE-467A from S. tendae (naa = 74), it was 
observed that, below 55 °C, the exchanges of all of the 
amido protons that could be monitored in the polypep- 
tide backbone of each of these native proteins were 
unconcerted, consistent with exchange at the EX, 
limit 365373 

Whether the exchange ofa particular proton is at the 
EX, limit or the EX, limit can also be assessed by follow- 
ing the effect of pH on the observed rate of exchange. 
Above pH 5, kexi for an amido proton in a peptide bond is 
directly proportional to the concentration of OH™ 
(Equation 12-58).°°°"" If the exchange is at the EX, limit 
(Equation 12-63), it must display the expected increase in 
rate as the pH of the solution is increased. The observed 
rates of proton exchange at several of the peptide bonds 
in bovine basic pancreatic trypsin inhibitor at 68 °C 
increased by a factor of 10 for each increase of one unit in 
pH from pH 5 to 7°” as expected of exchange at the 
EX, limit. Above pH 7, however, the rates no longer 
increased as the pH was increased, perhaps because kex 
for these protons at this high temperature had become so 
rapid (Figure 12-31) that kex became greater than ka. These 
results suggest that the balance between exchange gov- 
erned by the EX; limit and that governed by the EX, limit 
under certain conditions may lie in the range of physio- 
logical pH. In such instances, by following the exchange 
of a particular proton as a function of pH, the range of pH 
in which the EX, limit governs the exchange and the range 
of pH in which the EX, limit governs the exchange can be 
distinguished. In the former range, the exchange increases 
by a factor of 10 for every increase of one unit in pH; in the 
latter range, the exchange is invariant with pH. From such 
observations, values of ka, kop, and Kent for each member 
of a set of specific peptide bonds can be measured 
(Equations 12-62 and 12-63) 34 

It is difficult to make measurements of the 
exchange of protons at several values of pH. Often an 
analysis is made of rates of exchange at only two values 
of pH. In this instance, the logarithms of the observed 
rate constants of exchange of the amido protons in the 
polypeptide backbone at one pH are plotted as a func- 
tion of their logarithms at a different pH.” If they 
fall on a line of slope 1, then the individual conforma- 
tional changes producing each exchange are not signifi- 
cantly dependent on pH, and if the line has an 
intercept at the ordinate axis equal to the difference in 
pH between the two solutions,*” then the process of 
exchange for each proton has the pH dependence 
expected of k,,; and is judged to be at the EX, limit. If, 
however, the intercept of the line is not equal to the 
difference in pH of the two solutions, the exchanges are 
not at the EX, limit.°® If the intercept of the line is zero, 
the exchange is probably at the EX, limit and the 
observed rate constants of exchange are equal to the 


rate constants Ke for the various conformational 
changes permitting exchange. 

In lysozyme from G. gallus, there is a group of 
amido protons in the core of the protein that all 
exchange their protons 10° times more slowly than the 
same amides in a small peptide.*”° This group of amides 
in the center of the protein may be exposed to the sol- 
vent only during large cooperative unfoldings of a con- 
siderable fraction of the protein that expose many amido 
protons simultaneously for a short time before the 
polypeptide snaps shut again. If this is the case, these 
deeply buried regions of lysozyme spend 10° of their life 
in an unfolded state. The individual amido protons in 
the polypeptide backbone of bovine basic pancreatic 
trypsin inhibitor, however, have a rather heterogeneous 
set of rate constants of exchange (Figure 12-33).°”° This 
suggests that local vibrational modes producing local 
unfoldings are responsible, at least in this protein, for 
performing the disconnections of a particular structure 
required to expose the amide to the solvent so that its 
proton can exchange. Whether the motions responsible 
for the exchange of a particular proton are local, 
regional, or global, the recurring observation that all or 
almost all of the amido protons, at least of small pro- 
teins, eventually exchange means that a molecule of pro- 
tein is continuously breathing or unfolding throughout 
its lifetime. 

It is also possible to observe the exchange of pro- 
tons by neutron diffraction. Crystals of protein are rou- 
tinely prepared for neutron diffraction by soaking them 
in PH]H,O to replace as many protons as possible with 
deuterons, which scatter neutrons more strongly. 
Molecules of trypsin?” and ribonuclease*”* that had been 
soaking within crystals in [*H]H,O for periods of 1 year 
retained protons on 54 and 28 of their amides, respec- 
tively. All of the sites that remained unexchanged were in 
the interior of the folded polypeptide. These were mainly 
on the central strands of B sheets and at the centers of 
ahelices. Most of the sites retaining protons after 1 year 
were not even partially exchanged with deuterons, while 
those sites that had deuteriums were almost fully 
exchanged. The location of these sites and their occur- 
rence in regions with very little thermal motion suggest 
that, in a crystal, only local motions are responsible for 
what exchange takes place. Because large regional 
unfolding or the complete unfolding of the polypeptide 
cannot occur in a crystal, these observations provide fur- 
ther evidence that the exchange of protons at deep loca- 
tions in a protein observed in free solution result from 
such types of extensive unfolding. 

Observations of the exchange of protons can be 
used to examine heterologous associations. For exam- 
ple, the rates of exchange of 11 amido protons in the 
polypeptide backbone of equine cytochrome c, which 
are at unrelated positions in its amino acid sequence 
but form a continuous region on the surface of the terti- 
ary structure of the protein, decrease significantly when 
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it is bound by a monoclonal immunoglobulin.*” It was 
concluded that this region formed the epitope on the 
surface of the cytochrome c. When a synthetic peptide 
with the amino acid sequence from Alanine 1730 to 
Leucine 1747 in smooth muscle [myosin-light-chain] 
kinase from G. gallus, which was known to be the site to 
which calmodulin binds in the intact protein, associates 
with calmodulin, the rates of exchange of 12 of its amido 
protons decrease by factors between 10° and 10° as it 
forms the whelix embraced by the calmodulin in the 
Comples 

Results of such experiments, however, should be 
evaluated with caution. Effects on the exchange of amido 
protons can be felt at locations distant from the site of 
interaction because the association of a protein with a 
ligand always increases its global stability and conse- 
quently decreases the time its entire structure spends in 
open conformations. For example, the binding of thymi- 
dine 3’,5’-diphosphate to micrococcal nuclease, causes 
the rate of exchange of at least 34 of its amido protons to 
decrease dramatically’ even though the site at which 
the ligand binds encompasses far fewer amino acids. In 
this instance, the binding of the ligand stabilizes the pro- 
tein globally and decreases its conformational fluctua- 
tions. Likewise, when NADH is bound to dihydro- 
dipicolinate reductase, the rates of exchange of amido 
protons in widely different locations showed significant 
decreases.”” 


Suggested Reading 
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Electron Paramagnetic Resonance 


If an atomic orbital or a molecular orbital contains a 
pair of electrons, the magnetic moments of their spins 
cancel because of the Pauli principle, and that orbital is 
diamagnetic. If an orbital contains a single unpaired 
electron, that electron has an uncancelled magnetic 
moment and that orbital is paramagnetic. There are 
several ways in which a molecule of protein can contain 
an orbital with an unpaired electron. A paramagnetic 
ion of a transition metal such as Mn”, Bei", Co, Ni’*, or 
Cu” can be bound to the protein either on its own or 
within a coenzyme like the heme in ferrimyoglobin 
(Figure 4-18). A stable organic radical like the glycyl 
radical in formate C-acetyltransferase or the tyrosyl rad- 
ical in ribonucleoside-diphosphate reductase can be 
formed by posttranslational modification of the protein 
(Table 3-1). A coenzyme bound to the protein can con- 
tain an organic radical.” The protein can be modified 
with a reagent in which there is a stable organic radical, 
such as the one in a 1-oxyl-2,2,5,5-tetramethylpyrrolin- 
3-yl group: 
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Such a functional group can be coupled covalently either 
by incorporating an unnatural amino acid containing it 
into the protein in a cell-free translation!” or by modify- 
ing the protein with an electrophilic reagent containing 
it.” 

An unpaired electron can have a spin quantum 
number of +% or A, and these two quantum numbers 
dictate two respective spin states with two respective 
angular velocities of the same magnitude but opposite 
polarity. When an unpaired electron is placed in an 
external homogeneous magnetic field, the axis of its 
spin tends to align with the direction ofthe applied field. 
The two degenerate energy levels of the spinning elec- 
tron are split into two distinct energy levels, one for the 
spin aligned in the direction of the magnetic field and 
one for the spin aligned in the direction opposed to the 
magnetic field. The difference in energy between these 
two spin states for a given population i of identical 
unpaired electrons, AE, is directly proportional to the 
magnetic flux density B; (tesla) at the location of 
unpaired electron i. The frequency v; (hertz) of electro- 
magnetic energy that is absorbed by the population of 
electronsi during its transition between these spin 
states is 

AE; = g, Ug B; = hv; 


i (12-64) 
where g, is the g factor of a free electron (2.0023) and 5 
is the Bohr magneton (9.27 x io J T). At magnetic flux 
densities normally used for electron paramagnetic reso- 
nance (<2 T) the difference in energy is less than 20 J 
mol", the energy contained in a photon of frequency less 
than 40 GHz, which is the microwave range of electro- 
magnetic energy. 

As with continuous-wave nuclear magnetic reso- 
nance spectrometers, an electron paramagnetic reso- 
nance spectrometer has a microwave generator of a 
fixed frequency, for example, 35 or 9 GHz, and the mag- 
netic flux density is varied while the absorption of energy 
is monitored (Figure 12-34). Peaks of absorption are 
observed when Equation 12-64 is satisfied for a given 
population of identical electrons of unpaired spin. As 
with nuclear magnetic resonance, saturation of the 
absorption occurs readily in electron paramagnetic reso- 
nance owing to the small difference in the occupancy of 
the two spin states (1 < Ksp < 1.008) and the slow rate of 
relaxation between them. There are, however, several 
features of an electron paramagnetic resonance spec- 
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Figure 12-34: Electron paramagnetic resonance spectra of a 
0.5mM solution of CDP-6-deoxy-L-threo-D-glycero-4-hexulose- 
3-dehydrase in the presence of 10 mM CDP-6-deoxy-L-glycero- 
4-hexulose and 1 mM NADH frozen at 77 K.*™ The bottom trace is 
the absorption as a function ofthe flux density ofthe magnetic field 
(tesla) at a carrier frequency of 9.05 GHz. The spectrum was 
obtained by varying the magnetic field while the microwave fre- 
quency remained fixed. The top trace is the first derivative (change 
in absorption/change in magnetic flux density) ofthe bottom trace; 
the dotted line sits on a value of zero for the first derivative. The 
[2Fe-2S] cluster in the protein contains an unpaired electron that 
absorbs at g factors of 2.012, 1.950, and 1.932, producing the three 
peaks of absorption in the two spectra at 0.321, 0.331, and 0.334 T, 
respectively. Reprinted with permission from ref 384. Copyright 
1996 American Chemical Society. 


trum that distinguish it from a nuclear magnetic reso- 
nance spectrum. 

For unpaired electrons on ions of transition metals, 
the intensity of the electron paramagnetic absorption 
increases and the width of the peak of absorption 
decreases dramatically as the temperature of the sample 
is lowered. For unpaired electrons on carbon, nitrogen, 
or oxygen, the intensity of the absorption also increases 
as the temperature is lowered. Consequently, electron 
paramagnetic resonance is often monitored while a 
sample of the protein containing the unpaired electron is 
in the frozen state at low temperature. For example, the 
electron paramagnetic resonance spectrum of aminocy- 
clopropane carboxylate oxidase, the iron in whose heme 
had been complexed with nitrous oxide, was observed at 
8 KD and that of the molybdenum-iron protein of nitro- 
genase, the metal cluster of which had been complexed 
with ethene, was observed at 2 KD At such low temper- 
atures, molecular motions and chemical reactions are 
severely limited. The most convenient low temperature 
is 77 K, the boiling point of liquid nitrogen, but the spec- 
tra of many types of organic radicals can be observed at 
room temperature even though the amplitudes of the 
peaks of absorption are less than they would be at lower 
temperatures. 


The absorption of microwave energy is usually 
monitored as its first derivative with respect to the flux 
density of the magnetic field. Consequently, peaks of 
absorbance appear as pairs of positive and negative 
deflections representing the positive slope of the rising 
phase and the negative slope of the declining phase of 
the absorption itself, and the differential passes through 
zero at the maximum of each absorption (Figures 12-34 
and 12-35).°® 

The absorption from a particular population of 
identical unpaired electrons is often split into two or 
more separate peaks by spin-spin coupling, either 
between the electron and magnetic nuclei connected to 
it by covalent bonds or between the electron and a mag- 
netic nucleus on which it happens to reside. As in nuclear 
magnetic resonance, this spin-spin coupling results 
from local perturbations to the applied magnetic field. 
These perturbations are caused by differences in the ori- 
entations of the spins of the coupled nuclei, the magnetic 
fields of which are transmitted through the diamagnetic 
electrons in the covalent bonds surrounding the 
unpaired electron. As a result, the magnetic flux density 
sensed by a given electron i, B; is the sum of the flux den- 
sity of the applied magnetic field, Bapp, and the flux den- 
sities of any local magnetic fields, Bloo created by these 
coupled magnetic nuclei. The spin-spin splitting in elec- 
tron paramagnetic resonance is referred to as hyperfine 
splitting; the resulting pattern of peaks, as hyperfine 
structure; and the spin-spin coupling, as hyperfine cou- 
pling. 

The 1-oxyl-2,2,5,5-tetramethylpyrrolin-3-yl group 
(12-11) serves as a simple example of hyperfine splitting 
(Figure 12-35). In this stable free radical, the nucleus that 
dominates the local magnetic field is that of the “nitro- 
gen, which has a spin quantum number of 1. The "nitro- 
gen nucleus is quadripolar and can assume spins of +1, 0, 
and -1 with equal probability, because the distribution 
among its energy levels is insignificantly affected by the 
applied magnetic field. As a result, the local magnetic flux 


0.5 mT 


Figure 12-35: Electron paramagnetic spectrum of 3-carbamoyl- 
1-oxyl-2,2,5,5-tetramethylpyrroline (see 12-11) in water at room 
Temperature TI The scale indicates the dimension of the horizon- 
tal axis in units of magnetic flux density (tesla). The carrier fre- 
quency of the spectrophotometer was 9.5 GHz. The first derivative 
of the absorption (change in absorption/change in magnetic flux 
density) is presented. Reprinted with permission from ref 383. 
Copyright 1965 held by authors. 
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density, Boo created by the nitrogen nucleus assumes 
three values, one of which is zero. The spectrum consists 
of a central absorption, arising from unpaired electrons 
coupled to nitrogen nuclei of spin quantum number 0 
and from unpaired electrons not coupled to any mag- 
netic nucleus, and two peaks of hyperfine absorption on 
either side of the central peak, arising from unpaired 
electrons coupled to nitrogen nuclei of spin quantum 
number +1 and -1. The hyperfine absorptions are of vari- 
able magnitude and are split different distances from the 
central absorption depending on the quality of the cou- 
pling between the nitrogen and the electron, but the cen- 
tral absorption will always be fixed because it is at the 
position where the contribution of the “nitrogen to the 
local magnetic flux density is zero. Information about 
environment, rotational diffusion, and anisotropy is con- 
tained in the hyperfine absorptions. 

The full coupling of the electron to the ‘nitrogen is 
expressed in aqueous solution (Figure 12-35) because 
the electronic structure in which the radical occupies a 
p orbital over nitrogen, a distribution which requires a 
separation of charge, can be readily solvated by the 
water. In nonpolar environments such as within a mole- 
cule of protein, the nitroxyl radical can shift its hybridiza- 
tion to form an ethenyl molecular orbital system 


CoO: 
\ 1 
` ‘ —— bonding orbital 
` --- antibonding orbital 
A 


NH 


4 
D 
CD 


composed from one of the lone pairs on oxygen and the 
radical. The unpaired electron occupies the antibonding 
molecular orbital and spends more of its time over 
oxygen, which is diamagnetic. This delocalization 
decreases the effect of the quadripolar nucleus of the 
“nitrogen, and the two hyperfine components decrease 
accordingly in intensity. 

Hyperfine coupling can be used to draw conclu- 
sions about the properties and location of the unpaired 
electron. The unpaired electron on the glycyl radical in 
formate C-acetyltransferase is unaffected by the diamag- 
netic carbon on which it resides but is split into two 
peaks by the single ‘hydrogen on that carbon (Figure 
12-36).°°’ When formate C-acetyltransferase is trans- 
ferred to [7H]H,O, the 'hydrogen exchanges with *hydro- 
gen in the solution in a reaction catalyzed by a nearby 
cysteine, and the spin-spin splitting disappears. 
Tryptophan tryptophylquinone is present in amine 
dehydrogenase as a posttranslational modification 
(Table 3-1, Figure 3-18). A model compound for trypto- 
phan tryptophylquinone in which the two 
*H,;N(COO)CH- groups of the bis(amino acid) are 
replaced with hydrogens can be reduced with one elec- 
tron to produce the semiquinone with an unpaired elec- 
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Figure 12-36: Electron paramagnetic resonance spectra of the 
glycyl radical (Table 3-1) in formate C-acetyltransferase.’®’ The 
spectra were gathered from solutions of the protein at pH 7.6. (A) A 
solution (20 mg mL”) of the protein in [H]H,O was frozen in liquid 
nitrogen, and the spectrum was recorded at 77 K. (B) The solution 
was then thawed, mixed with three volumes of [?H] H,O, allowed to 
sit for 2 min, and refrozen, and another spectrum was recorded. 
The carrier frequency of the spectrometer was 9.23 GHz. The verti- 
cal axis is the first derivative of the absorption (change in absorp- 
tion/change in magnetic flux density). The dimension of the 
magnetic flux density (tesla) along the horizontal axis is indicated 
by the scale, and the g factor of the absorption (2.0037) is indicated 
in panel A. Reprinted with permission from ref 387. Copyright 1995 
American Chemical Society. 


tron. The absorption from the unpaired electron in the 
semiquinone was split into at least 22 peaks. This split- 
ting was thought to result from four nonequivalent 
‘hydrogens, one set of methyl hydrogens, and two non- 
equivalent “nitrogens. It follows that, in the semi- 
quinone, the electron must be delocalized over both 
indole rings.” There is an unpaired electron in formate 
C-acetyltransferase that has been inactivated by covalent 
modification with fluoropyruvate. It could be shown to 
be located on carbon 3 of a defluorinated pyruvyl group 
because its absorption was split into three peaks when 
[H]fluoropyruvate was used for the modification but 
was unsplit when [?H]fluoropyruvate was used.”” 

Just as in nuclear magnetic resonance, the set of 
peaks arising from the hyperfine splitting of the absorp- 
tion from a particular population of identical unpaired 
electrons is distributed more or less symmetrically about 
a central point, which is the point in the spectrum at 
which that electron would have absorbed were its 
absorption not split by hyperfine coupling. This central 
point gives the g factor for that electron. The peak of the 


absorption from a population of identical unpaired elec- 
trons that is unsplit by hyperfine coupling has its lone 
peak at its g factor (Figure 12-37). The flux density of 
the applied magnetic field, Ban, which is varied to pro- 
duce the spectrum, can be converted into units of 
g factor (Figure 12-37) with the relationship 


hvo 
g= > (12-65) 
Ben Up 


where h is Planck’s constant (6.626 x 10™°*J s), Vp is the 
carrier frequency of the spectrometer, and 4; is the Bohr 
magneton (9.274 x 10° TTT. When the unpaired elec- 
trons in a particular population are located on a carbon, 
a nitrogen, or an oxygen, the g factor of the absorption 
lies close to 2.0023, which is that for a free electron. For 
example, the g factor for the absorption of the unpaired 
electron in a pheophytin radical in photosystem II from 
spinach is 2.0034;**' that for the glycyl radical in formate 
C-acetyltransferase from E.coli is 2.0037 (Figure 
12-36);°° and that for the unpaired electron localized on 
“nitrogen in 12-11 when it is covalently attached to 
Cysteine 76 of T4 lysozyme is 2.0066.°° 

If the unpaired spins of a particular population of 
identical electrons are on a paramagnetic transition 
metal ion such as Mn”, Pei", Co”, Ni", or Cu", which 
affects the local magnetic field significantly, the g factor 
can vary dramatically depending on the orientation of 
the orbital it occupies relative to the direction of the 
applied magnetic field. For example, the orbital in which 
the unpaired electron resides on the ferric ion in a mole- 
cule of ferrimyoglobin (Figure 4-18) in a crystal of this 
protein is held in a specific orientation by the ligand field 
of the heme, which in turn is held in position by the crys- 
tal. In the unit cell of a crystal, there are two molecules of 
myoglobin, each with a different orientation. In the elec- 
tron paramagnetic resonance spectrum of a crystal of 
myoglobin, there are two peaks of absorption, one for 
each of these orientations (Figure 12-37). When the crys- 
tal is rotated in the magnetic field, the g factors of the 
populations of electrons producing these peaks vary 
sinusoidally between a maximum of 6.0 and a minimum 
of 2.4.3% When the molecules of myoglobin are oriented 
randomly in a frozen solution, the absorption observed is 
the sum of all of the absorptions from the individual 
random orientations. As with the g factor of the absorp- 
tion from an unpaired electron on a paramagnetic ion, 
the coupling constant for a particular hyperfine splitting 
from a magnetic nucleus can vary dramatically as the ori- 
entation of the orbital of the unpaired electron relative to 
the applied magnetic field is varied.” Such variations in 
g factor or coupling constant can be used to draw con- 
clusions about the orientation in a particular sample of 
molecules in which the unpaired electron resides. 

Just as energy transfer between two chromophores 
can be used to estimate the distance between them 


dA/dB 


65 4 3. 2 2 1.8 
g Factor 


Figure 12-37: Electron paramagnetic spectra of a crystal of myo- 
globin in the ferric oxidation state.*” A single crystal of ferrimyo- 
globin from P. catodon (1 x 10° mol) was oriented in the applied 
magnetic field so that its ab plane was parallel to the direction of 
the field and its a axis was at an angle of about 40° to the direction 
of the field. After the spectrum (A) of the crystal in this orientation 
was recorded, it was dissolved in 10 uL of distilled water, the solu- 
tion was frozen, and a second spectrum (B) was recorded. Both 
spectra were recorded at 77 K. The vertical axis, dA/dB, is the first 
derivative of the absorption with respect to magnetic flux density. 
The horizontal axis is calibrated in g factor, which is inversely pro- 
portional to magnetic flux density (Equation 12-65), hence the 
inverse calibration. Reprinted with permission from ref 390. 
Copyright 1967 American Society for Biochemistry and Molecular 
Biology. 


(Equation 12-49), so can the magnitude of the magnetic 
dipolar interactions between two unpaired electrons. 
These dipolar interactions can be between an unpaired 
electron on a transition metal cation and an unpaired 
electron on a 1-oxyl-2,2,5,5-tetramethylpyrrolin- 
3-yl group" or between two 1-oxyl-2,2,5,5-tetra- 
methylpyrrolin-3-yl groups.” The magnetic dipolar 
interactions decrease the intensity of the absorptions 
from the two unpaired electrons, and under appropriate 
circumstances an estimate of the distance between them 
can be made Hi As with transfer of fluorescent energy, 
however, there are significant and usually uncontrolled 
orientation factors affecting the calculation. Changes in 
the magnitude of magnetic dipolar interactions between 
two _1-oxyl-2,2,5,5-tetramethylpyrrolin-3-yl_ groups 
inserted at specific positions in the amino acid sequence 
of T4 lysozyme have been used to monitor a change in its 
conformation upon binding of a ligand and estimate the 
change in distance between those positions during that 
conformational change "77" 

In simple situations it is often possible to infer the 
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identities of the magnetic nuclei producing a particular 
set of hyperfine splittings from a knowledge of the chem- 
ical structure of the radical bearing the unpaired elec- 
tron. The identities of these inferred magnetic nuclei are 
usually confirmed by replacing them with one of 
their isotopes and observing the expected change in 
the hyperfine splitting. For example, the identity of the 
‘hydrogen producing the hyperfine splitting of the 
absorption from the unpaired electron in the glycyl radi- 
cal in formate C-acetyltransferase (Figure 12-36) and the 
identities of the hydrogens around the phenyl ring pro- 
ducing the hyperfine splitting of the absorption from the 
unpaired electron in the tyrosyl radical of ribonucleo- 
side-diphosphate reductase from E coli” were con- 
firmed by replacing them with “hydrogens. 

If the pattern of hyperfine splitting cannot be 
explained because several unknown magnetic nuclei 
contribute to it, further information about their identity 
can be obtained from an electron nuclear double reso- 
nance (ENDOR) spectrum "7" The magnetic field of 
the spectrometer is adjusted so that the population of 
unpaired electrons is absorbing at the center of one of its 
hyperfine peaks. That absorption is then saturated by 
increasing the power of the microwave transmitter. The 
sample is then irradiated with a radiofrequency trans- 
mitter, the frequency of which is varied progressively. 
When the transmitter reaches the Larmor frequency of a 
population of magnetic nuclei participating in the par- 
ticular hyperfine coupling that produced the peak on 
which the spectrophotometer is poised, the rate of relax- 
ation of the electron increases and its absorption at sat- 
uration increases. The output of the spectrophotometer 
shows peaks of increases in absorption at radio frequen- 
cies equal to the Larmor frequencies of the population of 
nuclei producing the hyperfine splitting. There are a pair 
of Larmor frequencies for each population of coupled 
nuclei because the unpaired electron splits their absorp- 
tion by the same coupling constant with which they split 
the absorption of the electron. These two peaks are cen- 
tered on the Larmor frequency the population of coupled 
magnetic nuclei would have in the absence of the popu- 
lation of unpaired electrons. 

Because the resolution of an electron nuclear 
double resonance spectrum is low and the splitting is 
large, usually all that can be learned is the element of the 
population of coupled nuclei and the coupling constant 
between that population of nuclei and the population of 
unpaired electrons. When [8-""N]N*-hydroxyarginine is 
bound by rat neuronal nitric-oxide synthase, the hyper- 
fine splitting of the absorption from the Pei" in the heme 
of the enzyme at a g factor of 4.03 could be shown to arise 
from the nitrogen of the ligand because in the electron 
nuclear double resonance spectrum (Figure 12-38) there 
were a pair of peaks with the proper coupling constant 
centered on the Larmor frequency of the nucleus of a 
nitrogen.*”’ Likewise, it could be shown that the hyper- 
fine peak at a g factor of 1.96 in the electron paramag- 


650 Physical Measurements of Structure 


Absorption 


2.3 24 25 26 2.7 28 29 30 
Radio frequency (MHz) 


Figure 12-38: Electron nuclear double resonance (ENDOR) spec- 
trum of a solution of 0.5 mM neuronal nitric-oxide synthase from 
Rattus norvegicus and 5 mM [8-'°N]N*-hydroxyarginine.*” The 
electron paramagnetic resonance spectrum of the frozen solution 
at 2 K had three peaks of absorption from the unpaired electron in 
the heme of the protein with g factors of 7.65, 4.03, and 1.8. The 
spectrometer was poised on the peak of absorption at the g factor 
of 4.03 at a microwave carrier frequency of 34.7 GHz, the absorp- 
tion of microwave energy was saturated by increasing the output of 
the transmitter, and the absorption of microwave energy was 
recorded as a function of the applied radiofrequency energy. The 
absorption is presented as a function of radio frequency (mega- 
hertz) centered on the Larmor frequency (2.65 MHz) for nitrogen 
at the applied magnetic field. The width of each peak exceeds by 
many fold the span of the full range of chemical shifts observed for 
Bnitrogen in a protein (200 ppm). Reprinted with permission from 
ref 397. Copyright 1999 American Chemical Society. 


netic spectrum of the complex between ["’C]ethene and 
the molybdenum-iron protein from nitrogenase was 
caused by the spin-spin coupling between the unpaired 
electron in the metal cluster and a "carbon in the bound 
ethene* and that the hyperfine peak at a g factor of 4.23 
in the complex between [°NJalanine, nitrous oxide, and 
aminocyclopropane carboxylate oxidase was caused in 
part by the spin-spin coupling between the nitrogen of 
the alanine and the unpaired electron in the nitroxyl 
heme.*® In situations where electron nuclear double res- 
onance spectra have several peaks because the peak of 
absorption on which the spectrophotometer is poised 
has several components, isotopic substitution, for exam- 
ple *hydrogen for ‘hydrogen, can sort out the origin of 
particular peaks.*?* In complicated electron paramag- 
netic spectra in which it is unclear on which g factor a 
particular pattern of hyperfine peaks are centered, the 
coupling constants obtained from the electron nuclear 
double resonance spectra can often resolve the problem. 

The similarity between electron nuclear double res- 
onance spectra or electron paramagnetic spectra from 
two different molecules can be used as evidence that the 
radicals present in each are chemically identical. For 
example, the similarity between the electron nuclear 
double resonance spectrum of the radical in bovine cata- 


lase to that of the tyrosyl radical in ribonucleoside- 
diphosphate reductase from E. coli confirmed that the 
former also was a tyrosyl radical. Likewise, the coinci- 
dence of the g factor for synthetic bacteriochlorophyll 
radical cation with the g factor of the initial photooxi- 
dized donor in the photosynthetic reaction center of 
photosynthetic bacteria and the fact that the absorptions 
from both of these radicals decreased by exactly the same 
factor when the bacteriochlorophyll and the bacteria, 
respectively, were perdeuterated identified the donor in 
the reaction center as a bacteriochlorophyll. The fact that 
under all circumstances the width of the absorption in 
the electron paramagnetic spectrum of the radical 
formed from the donor in the reaction center was V2 that 
of a monomeric bacteriochlorophyll radical cation 
demonstrated that the radical cation in the reaction 
center was actually an electronically coupled dimer of 
bacteriochlorophyll.*”! These conclusions were validated 
by the crystallographic molecular model of the pro- 
tein.” 
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Chapter 13 
Folding and Assembly 


Each polypeptide begins its existence by emerging, amino 
terminus foremost, from aribosome. Its initial amino acid 
sequence is the complete translation of the sequence in 
which the codons are arranged between the start codon 
and the stop codon on the messenger RNA. At some point 
in its early history, the polypeptide folds to assume its 
native state. The native state of a polypeptide is the lim- 
ited set of equilibrating conformations in which it will 
spend the remainder ofits lifetime and in which it is capa- 
ble of performing its role within or on behalf of the living 
organism in which it was synthesized. The native state is 
the set of conformations of the polypeptide represented 
by the crystallographic molecular models of the protein. 
It is also referred to as the folded state. On the basis of the 
easily verified existence and identity of the native state, a 
denatured state of a polypeptide can be defined as its 
antonym. A denatured state of a polypeptide is any set of 
equilibrating conformations of that polypeptide that is 
not or does not contain the set of conformations of the 
native state. As it emerges from the ribosome, the nascent 
polypeptide is in a denatured state. 

The initial folded state of the polypeptide can 
undergo posttranslational modification, it can combine 
with several other identically folded polypeptides of the 
same sequence or several other folded polypeptides of a 
different sequence and structure, or it can enter a helical 
polymeric protein as one of the protomers. The product 
of these steps is the mature native state of the protein 
encountered in the living tissue. The order in which these 
later processes occurs cannot be predicted, but all of 
them usually follow the folding of the unadorned 
polypeptide because it is usually only the folded polypep- 
tide that contains the information controlling them. 

Accordingly, the steps in the maturation ofa protein 
can be divided into folding, posttranslational modifica- 
tion, and assembly. Folding is any process by which the 
polypeptide initially in a denatured state, for example, its 
set of conformations as it emerges from the ribosome, 
assumes the folded native state. Assembly is the process 
by which individual folded polypeptides associate to form 
their ultimate oligomeric or polymeric protein. 


Thermodynamics of Folding 


A polypeptide is a polymer of amino acids (2-15). It is 
known from studies of polymers in general that their 


conformational behavior depends critically on the sol- 
vent in which they are dissolved." If the functional groups 
of its repeating units are miscible with the solvent, the 
polymer is free to expand and expose all of those func- 
tional groups to that solvent without penalty. Such a sol- 
vent is a good solvent. When a polypeptide is dissolved 
in a good solvent, rotation about each bond between 
amide nitrogen and o carbon and between o carbon and 
acyl carbon is permitted, within the confines of the 
clashes represented in the Ramachandran plot (Figure 
6-4) and within the requirement that no two atoms any- 
where in the polypeptide can occupy the same space at 
the same time. As with any other unconfined organic 
molecule in solution, the conformation of a polypeptide 
in a good solvent is continuously changing as these rota- 
tions occur at random. Such a protean polymer in a good 
solvent is a random coil. This term incorporates 
unavoidably the uncontrolled and continuous motion of 
this process. A random coil is a special type of unfolded 
state. The unfolded state of a polypeptide is a state in 
which the polypeptide is significantly expanded relative 
to the native state so that most its structure is exposed to 
the solution even though it may not be a fully random 
coil. 

Unfortunately, there are few good solvents for nat- 
urally occurring polypeptides. This is due to the fact that 
almost all’* natural polypeptides are created to fold. To 
fold, they must be composed of a mixture of hydropho- 
bic and hydrophilic amino acids, placed in a particular 
sequence. There is almost no solvent in which the result- 
ing mixture of side chains is miscible. In particular, 
water, although it is a good solvent for the hydrophilic 
side chains, is a bad solvent for the hydrophobic side 
chains. 

Abad solvent is a solvent in which functional groups 
of the repeating units of a polymer are only sparingly sol- 
uble. In a bad solvent, a polymer contracts to decrease the 
exposure of those sparingly soluble functional groups to 
the solution. The hydrophobic effect is a force that seeks 
to minimize the exposure of a hydrophobic solute to 
water. Because of the hydrophobic effect that is exerted 
on the hydrophobic side chains in a natural polypeptide, 
water is a bad solvent for such a polypeptide at neutral pH 
and at anionic strength of 0.2 M, which are the conditions 
under which most proteins are found. If water were not a 
bad solvent, natural polypeptides would not fold. Natural 
polypeptides, because they have evolved in water, fold in 
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water. Within a cell, each polypeptide begins its life 
emerging from a ribosome into the cytoplasm. Even 
though water is a bad solvent for the emerging polypep- 
tide, it probably remains in an unfolded state until it 
reaches a length at which there are a large enough 
number of hydrophobic side chains to accomplish its 
contraction. Consequently, beyond a certain length, the 
polypeptide has the potential to contract to form a state 
other than a random coil, yet there are experimental 
observations suggesting that at least some incomplete 
polypeptides adopt expanded unfolded states when they 
are dissolved in aqueous solution. 

Only glycerol, another cohesive, hydroxylic solvent, 
is also able to promote folding.’ Other pure solvents, 
because they dissolve hydrophobic functional groups 
rather than exclude them (Figure 5-22) cannot promote 
the folding of a polypeptide, but they are nevertheless 
bad solvents because they cannot solvate the polar side 
chains adequately. Consequently, if a polypeptide is not 
in its native state in a particular solvent, it will usually 
also not be a random coil. When an organic solvent that 
is miscible with water, such as ethanol, is added to a solu- 
tion of protein, it causes the protein to denature because 
it solvates its nonpolar groups, but it also diminishes the 
solvation of the polar groups, preventing the formation 
of a random coil. Denaturants are solutes that, when 
added to an aqueous solution of a protein, promote the 
formation of a denatured state of that protein. Most 
denaturants do not turn water into a good solvent. 

If one is studying its folding, both the native state 
and the denatured state of a polypeptide must be well- 
defined. The only denatured state of a polypeptide that 
can be defined with sufficient accuracy is a random coil. 
Therefore, the folding of a polypeptide is most informa- 
tively studied if the process that is monitored is the iso- 
merization between the random coil and the native 
state, even though this may not be what occurs in a cell. 
For this study to be accomplished, a good solvent is 
required. One of the few ways to create a good solvent is 
to add either guanidinium chloride or urea to an aqueous 
solution of the protein. Both of these solutes are denatu- 
rants, but they are denaturants that create a good sol- 
vent. Regardless of whether or not a polypeptide in a cell 
is arandom coil at its birth, experimentally an examina- 
tion of the thermodynamics of protein folding usually 
begins with the polypeptide as a random coil in a con- 
centrated solution of guanidinium chloride or urea. 

When almost all natural proteins, the cystines of 
which have been reduced to cysteines, are dissolved in 
solutions of guanidinium chloride (13-1) or urea (13-2) 


® 
HNH __ Op© 
Dës 
H H H H 
13-1 13-2 


at concentrations of 6 or 8 M, respectively, they become 
completely unfolded, and their constituent polypeptides 
become random coils. 

There are several ways to demonstrate this fact.° 
The molar masses of the proteins determined from the 
colligative properties of these solutions are those of 
the constitutive polypeptides rather than the oligomers. 
The intrinsic viscosities of proteins dissolved in these 
solutions range from 15 to 100 cm? g™ even though the 
intrinsic viscosities of the native proteins are between 3 
and 5 cm? g”. Furthermore, within a set of proteins, the 
intrinsic viscosities of their polypeptides dissolved in 
those solutions are correlated to the length of the con- 
stituent polypeptides by a relationship that agrees with 
theoretical expectation for the behavior of random coils. 
The optical rotatory dispersion spectra and circular 
dichroic spectra of proteins in such solutions are those 
theoretically expected from a polypeptide lacking any 
regular secondary structure, even if the spectra of the 
native proteins indicate that they are predominantly 
o helix and £ structure. The acid-base titration curves of 
proteins dissolved in these solutions lose the normally 
observed shifts in intrinsic pK, brought about by the 
electrostatic features of the native state and become 
simple sums of the constituent intrinsic acid-base titra- 
tions of the constituent amino acids (Table 2-2). All of 
the tyrosines in the protein display ultraviolet spec- 
trophotometric acid-base titrations with expected 
intrinsic values of pK,. The rates of amido proton 
exchange become very rapid when proteins are dis- 
solved in these solutions, and no evidence for a class of 
slowly exchanging amido protons is usually found. The 
ultraviolet spectra between 270 and 300 nm of proteins 
dissolved in these solutions are simple summations of 
the spectra of phenylalanine, tyrosine, and tryptophan 
and display none of the spectral shifts characteristic of 
the native states. 

Solutions of either guanidinium chloride or urea 
promote the unfolding of a polypeptide by increasing 
the stability of the random coil. This increase in stabil- 
ity is due to favorable changes in the solvation both of the 
side chains of the amino acids and of the polypeptide 
backbone brought about by these solutes. From meas- 
urements of the solubility of various amino acids, as well 
as diglycine and triglycine, in solutions of either urea’ or 
guanidinium chloride,® the standard free energies of 
transfer of both the side chains of the amino acids and 
the peptide bond between water and solutions of urea or 
guanidinium chloride have been estimated (Table 13-1). 
These standard free energies of transfer were derived 
from the differences between the solubilities of each of 
the amino acids and peptides and the solubilities of 
glycine in water, 7 M urea, or 5 M guanidinium chloride. 
To arrive at these estimates, it was assumed that the dif- 
ferences between the standard free energies of solution 
of glycine and each of the other amino acids in the vari- 
ous solutions of denaturants would give the standard 


free energies of transfer of each of the side chains or the 
peptide bond, respectively. 

The values obtained for leucine, phenylalanine, 
and tryptophan agree quite closely with direct measure- 
ments of the standard free energies of transfer for isobu- 
tane, toluene, and skatole as models of the respective 
side chains (Table 13-1).? The N-acetyl ethyl esters of 
leucine and phenylalanine, however, were found to have 
standard free energies of transfer, relative to ethyl 
N-acetylglycinate, that were much less negative (Table 
13-1).'” It may be premature to attach any significance to 
the absolute numerical values of these various estimates. 

It has been uniformly observed that the free ener- 
gies of transfer of both hydrophobic solutes and neutral 
hydrophilic solutes such as peptides between water and 
either 7 M urea or 5 M guanidinium chloride have nega- 
tive values. Unlike the hydrophobic effect, which is 
imposed only on hydrogen-carbon bonds, the increase 
in solvation performed by urea and guanidinium ion is 
linearly related to the accessible surface area of the side 
chain, regardless of its polarity," so that both hydropho- 
bic and hydrophilic functional groups that are exposed 
to the solution upon formation of the random coil are 
more favorably solvated in solutions of urea or guani- 
dinium chloride than they would be in water. This stabi- 
lization of the random coil increases monotonically, but 
not linearly” with the concentration of denaturant.’*'° 
At some concentration of the denaturant, which differs 
for each protein, the unfolded polypeptide becomes 
more stable than the folded polypeptide. This point is 
reached not only because of the increase in favorable sol- 
vation but also because the unfolded polypeptide is more 
disordered. 

The favorable solvation of both polar and nonpolar 
functional groups in a polypeptide by urea or guani- 
dinium ion quantified in the free energies of transfer in 
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Table 13-1 has been explained as the result of preferen- 
tial binding of the denaturant to the random coil.” 
There is no evidence, however, for the existence of par- 
ticular binding sites for either of these denaturants, 
which if they existed would have to be distributed rather 
uniformly over the dramatically heterogeneous surface 
of the random coil. A more realistic explanation would be 
that these denaturants partition favorably into the pecu- 
liar layer of water solvating the random coil, relative to 
the water in the bulk of the solution.” Regardless of the 
molecular explanation, the experimental observation 
that accounts for the increase in the stability of the 
random coil relative to that of the native state is that both 
urea and guanidinium ion preferentially solvate those 
portions of a polypeptide exposed to the solution, DI 
and a random coil simply exposes more of the polypep- 
tide than does the native state. The preferential solva- 
tion (Equation 1-57) of bovine serum albumin in its 
native state” by urea is +0.10 g mL"; and by guanidinium 
ion, +0.14 g mL". The increase in preferential solvation 
that occurs during the unfolding of lysozyme from Gallus 
gallus by urea" is +0.25 g mL”. These positive experi- 
mentally measured preferential solvations demonstrate 
that both of these denaturants are significant salting-in 
solutes. They do not exert their effects by decreasing the 
cohesion of the water and producing in turn a decrease 
in the hydrophobic effect because both urea and guani- 
dinium ion increase the surface tension of an aqueous 
solution.” 

The favorable solvation of the hydrophobic side 
chains by urea and guanidinium chloride does make a 
major contribution to the stabilization of the random coil 
(Table 13-1). This observation suggests that urea and 
guanidinium chloride cause the solution to become 
more like a usual organic solvent in its properties (Figure 
5-22). This effect may result from the stable introduction 


Table 13-1: Estimates of the Standard Free Energy of Transfer of Various Side Chains of the Amino Acids between Water 


and Solutions of Urea or Guanidinium Chloride 


amino acid AG? transfer, HD —7Murea AG° transfer,H,0 — 5 M GdmCl 
side chain (kJ mol’) (kJ mol) 
amino acid’ alkane model” N-acetyl ethyl ester‘ amino acid’ alkane model” N-acetyl ethyl ester‘ 
leucine -1.1 -1.0 +0.1 -1.8 -1.3 -0.1 
phenylalanine -2.2 -1.9 -0.7 -2.8 -2.5 -1.3 
tryptophan -3.2 -3.2 —4.6 —4.0 
methionine -1.5 -2.0 
threonine -0.4 -0.5 
tyrosine -2.8 -2.9 
histidine -1.0 -1.7 
asparagine -1.6 -2.4 
glutamine -0.8 -1.4 
peptide bond -0.8 -0.5 -1.3 -0.8 


“Calculated from the difference between solubility of glycine and the appropriate amino acid.”® "Free energy of transfer of isobutane, toluene, or skatole.? ‘Difference in 
free energy of transfer of ethyl N-acetylglycinate and N-acetyl ethyl ester of the amino acid.'° 
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of the nonpolar z clouds of the denaturants (2-26) into 
the solution. N-Alkyl-, N,N’-dialkyl-, and N,N,N’,N’- 
tetraalkylureas are even more effective at increasing the 
solubility of naphthalene, indole, and ethyl N-acetyltryp- 
tophanate in water than is urea itself.'*'” This observa- 
tion also suggests that it is nonpolar noncovalent 
interactions between urea and the hydrophobic amino 
acids that explain its ability to solvate them favorably.'® 
The fact that methylurea, dimethylurea, and tetramethy- 
lurea are each in turn increasingly better denaturants of 
proteins than urea itself!” and the fact that some alkyl- 
ureas are better denaturants than even guanidinium 
chloride” suggest that the favorable solvation by urea of 
the hydrophobic functionalities revealed during the for- 
mation of the random coil, in addition to its ability as a 
donor or acceptor of hydrogen bonds," is the major fea- 
ture of its ability to promote unfolding. 

A polypeptide will fold only if the free energy of the 
native state is less than the free energy of all accessible 
denatured states. Because of this requirement, for exam- 
ple, a nascent polypeptide cannot fold until it is long 
enough for the native state to contain a large enough col- 
lection of noncovalent interactions to overcome the sig- 
nificant unfavorable loss of standard entropy that must 
always accompany folding. It is also the case that a 
polypeptide which has undergone extensive covalent 
posttranslational modification after it originally folded 
may not be able to fold again after it has been returned to 
a denatured state. For example, proinsulin can be 
unfolded and its cystines reduced to cysteine. The pro- 
tein will then refold spontaneously to its native state, and 
the proper cystines will reform under oxidizing condi- 
tions.” Insulin, however, which is a posttranslationally 
modified fragment of proinsulin, missing 25 amino acids 
from the middle of the polypeptide, does not refold 
spontaneously after it has been unfolded and its cys- 
teines reduced, and it can be refolded only with sub- 
terfuge. The only fact that seems to be inescapable is 
that, at some point in its lifetime, a polypeptide has a 
covalent structure capable of folding to produce either 
the mature native state directly or an initial native state, 
which is modified subsequently but retains its basic 
folded state. 

A polypeptide that has not been modified so exten- 
sively as to cause the mature native state to be higher in 
free energy than the random coil or higher in free energy 
than any other accessible denatured state will, under the 
proper circumstances, spontaneously refold to its 
mature native state after it has been purposely turned 
into a random coil by dissolving it in 6 M guanidinium 
chloride or 8 M urea. Most of our understanding of the 
folding of polypeptides has been derived from the study 
of such conformational isomerizations. Their existence 
states that all of the information necessary to achieve the 
proper native state resides in the amino acid sequence of 
the polypeptide. 

The conformational isomerization that encom- 


passes the process of protein folding can be presented as 
the equilibrium 


kr 


U 


F 13-1 
ie (13-1) 


where F is the polypeptide folded in its native state and 
U is the unfolded state. The rate constants kp and ky are 
composite rate constants including any kinetic steps 
between the two extremes, and the equilibrium con- 
stant for folding, Kra, is defined by 


(13-2) 


Because polypeptides folded in their native state 
are by design reasonably stable in aqueous solution at 
physiological temperatures and ranges of pH, the molar 
concentration of the unfolded state under normal cir- 
cumstances is immeasurably low, and in such a situa- 
tion, neither the equilibrium constant nor the 
thermodynamic changes associated with folding can be 
measured directly by following the concentrations of the 
two forms of the protein. One solution to this problem is 
to shift the equilibrium by introducing an unnatural per- 
turbation. Because the unfolded states produced by 
adding guanidinium ion or urea are random coils, the 
least controversial perturbation that can be used to shift 
the equilibrium is to add increasing concentrations of 
one or the other of these solutes to a series of solutions of 
the protein. As the concentrations of these perturbants 
are increased, the unfolded form of the protein becomes 
more and more stable until the equilibrium constant for 
Equation 13-1 is small enough for measurable amounts 
of the unfolded form of the protein to exist. 

Raising the temperature or lowering the pH of the 
solution or a combination of these perturbations also 
stabilizes unfolded states of the protein relative to the 
native state. The decrease in the magnitude of the equi- 
librium constant for Equation 13-1 brought about by 
raising the temperature is presumably due to increases 
in thermal motion that always shift reactions in favor of 
more disordered states. The decrease in the magnitude 
of the equilibrium constant brought about by lowering 
the pH is due to the fact that, because an unfolded state 
is expanded relative to the folded state, it can support a 
greater net charge and thus can take up more protons 
than the unfolded state as the pH is lowered. The rela- 
tionship between the equilibrium constant for folding 
and the pH of the solution is governed by the differential 
equation? 


dln Kpa 


dln ay 


= HFT Zu,u (13-3) 


where ay' is the activity of protons in the solution and 
Zu and Zur are the mean net proton charge numbers of 
the unfolded and the folded states of the protein, respec- 
tively. 

The unfolded state of the protein can support a 
greater mean net proton charge number because the 
values of pK, for its functional groups are higher. 
Consequently, it takes up more protons as the pH is low- 
ered than does the folded state. This fact is verified by 
measuring the acid-base titration curves of the protein 
(Figure 1-11) in the absence and the presence of 6M 
guanidinium chloride.””” The two titration curves are 
then related to each other in absolute terms by measur- 
ing the moles of protons that must be added to a solution 
to maintain a constant pH as the protein is unfolded by 
adding guanidinium chloride.” In the acid region of the 
titration curves, the mean net proton charge number of 
the unfolded state is usually greater than that of the 
folded state, so the equilibrium constant for folding 
decreases as the pH is lowered (Equation 13-3). The 
decrease in Kra becomes more pronounced the more the 
pH is lowered so that a larger and larger fraction of the 
carboxylates in the protein become involved in the titra- 
tion.” At low ionic strength, there also may be a small 
increase in the repulsion between the positively charged 
side chains in the compact native state, which destabi- 
lizes it relative to the denatured state.” 

Because a side chain that is an acid-base often 
titrates anomalously when it becomes incorporated into 
the native structure of a protein, its incorporation will 
affect the equilibrium constant for folding. If the side 
chain is a carboxylic acid, the effect of its incorporation 
is most readily understood by considering the 
folding-unfolding at a pH low enough that it is fully pro- 
tonated in both the unfolded and the folded states. If it is 
buried in the folded state so that its pK, is elevated,” the 
carboxylic acid will lose its proton at a higher pH in the 
folded state than in the unfolded state. Consequently, as 
the pH is raised, Zur will be greater than it would have 
been if the carboxylic acid had not been buried, and the 
equilibrium constant for folding will be smaller than it 
would have been.?”” If the carboxylic acid has its pK, 
lowered by participating as the acceptor in one or more 
hydrogen bonds in the folded state,” it will lose its 
proton at a lower pH in the folded state than in the 
unfolded state, and the equilibrium constant for folding 
will be larger than it would have been.*! For example, 
Aspartate 76 in ribonuclease T, from Aspergillus oryzae 
participates in several hydrogen bonds in the native state 
that lower its pK, to 0.5, coincident with an increase in 
the equilibrium constant for its folding of a factor of 
400.*° A buried lysine, the pK, of which is lowered 
because it remains unprotonated in the folded state of a 
protein,” also decreases the stability of the folded state 
relative to the unfolded state” for the same reasons that 
a buried carboxylic acid does, but the argument begins 
with a pH high enough that the lysine is unprotonated in 
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the unfolded state and the pH is decreased. Similarly a 
histidine with an elevated pK, stabilizes the folded 
state.” All of these shifts can be explained quantitatively 
by considering the effects of the respective values of pK, 
for a particular side chain on the magnitudes of Zur and 
Zav” in the integrated form of Equation 13-3.7" 

A solution of guanidinium chloride at 6 M seems to 
produce the most complete unfolding to the random 
coil.’ The values of the various physical parameters for 
the same protein dissolved in 8 M urea rather than 6 M 
guanidinium chloride are slightly but significantly dis- 
placed in the direction of a folded state. Proteins dis- 
solved in solutions of low pH the temperatures of which 
have been raised until no further change in optical rota- 
tion occurs will still display further changes when guani- 
dinium chloride is added,” even when no intra- 
molecular hydrogen bonds seem to remain in the 
denatured protein, and this observation suggests that 
reversible thermal denaturation does not produce a 
random coil. Lowering the pH of a solution of a protein 
without applying heat often leads to its denaturation, but 
the denatured state produced by acid alone usually also 
retains residual structure.” When either the tempera- 
ture is increased or the pH is lowered, hydrophobic clus- 
ters (Figure 6-21) in otherwise unfolded polypeptides 
should remain associated. This would account for the 
incomplete unfolding observed in these situations. 

In any meaningful measurement of the properties 
of folding, the conditions must be such that the reaction 
remains reversible. When a concentrated solution of 
ovalbumin and lysozyme, otherwise known as the white 
of an egg, is heated, the polypeptides unfold but then 
rapidly coagulate among themselves to form a white, 
intractable, gelatinous precipitate. In the unfolded state 
produced initially by raising the temperature, otherwise 
buried hydrophobic amino acids on these polypeptides 
all become simultaneously exposed to the solution and 
noncovalent intermolecular polymerization takes place. 
There is little doubt that, in this example, a significant 
portion if not the majority of the changes in standard 
enthalpy and standard entropy proceeding during this 
process are those of the coagulation, which is of only 
marginal interest. 

In all studies of protein folding, the first result pre- 
sented should demonstrate the complete reversibility of 
the reaction. Foldings perturbed by the addition of urea 
or guanidinium chloride usually are reversible. With 
larger proteins, the rates of renaturation from a concen- 
trated solution of urea or guanidinium chloride, how- 
ever, can be slow; and if the concentration of denaturant 
is abruptly decreased by dilution, an otherwise reversible 
folding can become irreversible.” Denaturation pro- 
duced by acid is usually reversible because the denatured 
polypeptides are so positively charged that they will not 
coagulate.”° 

Although a few foldings perturbed solely by 
increases in temperature are reversible at neutral pH,*”” 
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most proceed irreversibly, usually with coagulation.“ 
When thermal unfolding is performed in a scanning 
calorimeter, however, the solution is heated continu- 
ously while the absorption of excess heat is monitored. It 
is possible that, under these conditions, the transition 
from folded protein to unfolded protein takes place in a 
short enough period that little coagulated protein accu- 
mulates, and the reaction remains reversible during that 
interval and only becomes irreversible upon coagulation 
at the higher temperatures experienced beyond the range 
of temperatures encompassing the transition to the dena- 
tured state. It has been argued that the behavior of many 
foldings in a scanning calorimeter—namely, the shapes 
of the curves, the effects of ligands, and the moleculari- 
ties of the apparent reactions—is that expected of a 
simple reversible isomerization.” At low pH (pH 2-3), 
however, most thermally perturbed foldings, even though 
they would proceed with coagulation at higher pH, often 
become reversible.**“* Presumably, this is due to the fact 
that coagulation is prevented by charge repulsion among 
the denatured polypeptides. It is usually observed that a 
polypeptide denatured thermally and reversibly at low 
pH will coagulate visibly and irreversibly as the pH is 
increased, and often the onset of this coagulation is found 
to occur abruptly within a very narrow range of pH.” 
When a physical property of a protein such as its 
intrinsic viscosity, its sedimentation velocity, its optical 
rotation, its molar ellipticity, its intrinsic fluorescence, its 
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absorption of ultraviolet light, its capacity to take up pro- 
tons from the solution at constant pH,” its absorption of 
heat at constant temperature,” its elution volume on 
chromatography by molecular exclusion,“ its elec- 
trophoretic mobility,” or its nuclear magnetic resonance 
absorptions“®® is measured as a function of tempera- 
ture, pH, or the concentration of urea or guanidinium 
chloride, changes indicative of a shift in the value of the 
equilibrium constant Kr for folding (Equation 13-1) are 
observed (Figure 13-1A).” 

Each pair of experimental points (a square and a 
circle) in Figure 13-1A represents a solution of cold 
shock-like protein from Thermotoga maritima at a par- 
ticular temperature, pH, and concentration of guani- 
dinium chloride. Two different initial solutions of protein 
were used. Either a solution of the native protein in the 
absence of denaturant was diluted by mixing into a solu- 
tion of guanidinium chloride (open symbols) or a solu- 
tion of the unfolded protein in 5.5M guanidinium 
chloride was diluted by mixing into a solution of guani- 
dinium chloride (solid symbols). In each case, the mix- 
ture was formulated to produce the noted final 
concentration of denaturant. One member of each pair 
of points is the initial fluorescence of the solution imme- 
diately after mixing (squares). For each final concentra- 
tion of guanidinium chloride, the solution was allowed to 
reach equilibrium (circles), which was assumed to be the 
state after all changes in fluorescence had ceased. 


Figure 13-1: Shift of the equilibrium constant for the folding of the cold 
shock-like protein from Thermotoga maritima (Na = 66) in solutions of 
guanidinium chloride.” Solutions of cold shock-like protein and solu- 
tions of guanidinium chloride, both at 25 °C, were mixed in a rapid mixing 
chamber that then introduced the mixture immediately into the cuvette 
of a fluorometer so that the fluorescence of the solution (expressed as the 
voltage from the photomultiplier) could be monitored continuously. The 
emission at wavelengths greater than 300 nm upon excitation at 280 nm 
(intrinsic fluorescence of tryptophan) was monitored as a function of 
time. (A) Initial and equilibrium levels of fluorescence. A solution of pro- 
tein (15 uM) was diluted 11-fold by mixing into solutions of guanidinium 
chloride prepared so that the final concentration (molar) of guanidinium 
chloride (GdmCl) after mixing would be that noted on the horizontal axis. 
The initial solution of protein was prepared either in an aqueous buffer at 


1000 pH 7.0 (open symbols) or in the same aqueous buffer 5.5 M in guani- 
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dinium chloride (solid symbols). The latter solution unfolded the protein 
completely. The initial fluorescence of each sample immediately after 
mixing was estimated by extrapolating the traces back to zero time 
(squares) to correct for the short interval between mixing and the start of 
the monitoring. Each isomerization was allowed to progress until the lack 
of further changes in fluorescence indicated that equilibrium had been 
reached. The fluorescence of the solution at equilibrium (circles) is plot- 
ted as a function of the final concentration of guanidinium chloride for 
protein that was refolded (@) from 5.5 M guanidinium chloride or that was 
unfolded (O). (B) Rate constants for folding. For each sample, the fluores- 
cence was monitored continuously as a function of time at 25 °C after 
mixing. The fluorescence in each sample either decreased or increased, 
respectively, as unfolding or folding progressed with first-order kinetics. 
Plots of these changes in fluorescence were fit by nonlinear least-squares 
to single-exponential functions to obtain first-order rate constants, ky + kr 
(second), which are plotted as a function of the final concentration 
(molar) of guanidinium chloride (GdmCl). Reprinted with permission 


from ref 50. Copyright 1998 Nature Publishing Group. 


Below a certain concentration of guanidinium chlo- 
ride (about 2 M), there is, at equilibrium, a linear 
increase in the intrinsic fluorescence with decreasing 
concentration of guanidinium chloride. Even in samples 
of native protein that eventually will unfold (©), the 
immediate magnitude of the fluorescence upon addition 
of guanidinium chloride, before unfolding commences, 
falls upon this baseline. This baseline traces the pertur- 
bation in the intrinsic fluorescence of the fully folded 
state due only to addition of the denaturant in the 
absence of any unfolding or after complete folding has 
been achieved. 

Above a certain concentration of guanidinium 
chloride (about 4 M) there is, at equilibrium, a linear 
increase in intrinsic fluorescence with increasing con- 
centration of guanidinium chloride, presumed to reflect 
the effect of increasing the concentration of denaturant 
on the intrinsic fluorescence of the unfolded polypep- 
tide. When fully unfolded protein is diluted into the 
range of concentrations of guanidinium chloride where 
it will fold (m), the immediate fluorescence of the solution 
before folding commences also falls upon this other 
baseline. 

At intermediate concentrations of guanidinium 
chloride, in the region of transition, the observed magni- 
tudes of the intrinsic fluorescence at equilibrium fall 
between the extremes of fluorescence of the fully folded 
protein and the fluorescence of the fully unfolded pro- 
tein. The region of transition is that range of denaturant 
concentration, pH, temperature, or pressure in which 
measurable concentrations of both denatured and native 
states are present in the solution in equilibrium with 
each other. It is flanked on its two sides by the ranges in 
which the polypeptide is almost completely folded and 
almost completely unfolded, respectively. It is in the 
region of transition that the equilibrium constant 
between the native state and the denatured state can be 
measured because sufficient concentrations of both 
states are present in the solution so that both are regis- 
tered by the physical property being monitored. 

Within the region of transition, the same equilib- 
rium value is reached whether the folded protein (O) or 
the unfolded protein (m) is the initial state, and this 
demonstrates that a reversible process is being moni- 
tored. If an equilibrium between folded state and 
unfolded state has not been established over the interval 
of observation, such a coincidence between the curve for 
folding and the curve for unfolding is not observed.” 
That the changes in the measured parameter within the 
region of transition cease at intermediate values 
between the extremes is also consistent with the conclu- 
sion that equilibrium has been achieved. 

Assume for the moment that at all concentrations 
of guanidinium chloride the solution contains an equi- 
librium mixture of only the fully native protein and its 
random coil. This is the two-state assumption.” At high 
enough concentrations of guanidinium chloride, the 
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concentration of random coil becomes sufficiently large 
that it contributes significantly to the intrinsic fluores- 
cence. At this point, the isomerization of the folding 
(Equation 13-1) continuously interconverts measurable 
quantities of native state and measurable quantities of 
unfolded state in equilibrium with each other. As the 
concentration of guanidinium chloride is increased fur- 
ther, a greater fraction of the protein is in the unfolded 
state until, finally, immeasurably small amounts of the 
native state are present at equilibrium. 

A similar monotonic transition between the native 
state and a denatured state, the two in equilibrium with 
each other, is observed when a series of solutions of a 
protein are each brought to a different temperature, as 
long as the thermal denaturation is reversible (Figure 
13-2). In the example presented in the figure, the 
increase in the stability of the denatured state relative to 
that of the native state with decreasing pH, as defined by 
Equation 13-3, is apparent in the shifts of the regions of 
transition to lower temperatures as the pH of the solu- 
tion is lowered. Similar shifts of the region of transition 
caused by either pH” or temperature“ are observed 
when the equilibrium between folded and unfolded 
states is being shifted with guanidinium chloride or urea. 
A monotonic transition reflecting the shift in equilibrium 
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Figure 13-2: Shift of the equilibrium constant for the folding of 
bovine ribonuclease A produced by increases in temperature at dif- 
ferent values of pH: 1.13 (@), 2.10 (0), 2.50 (O), 2.77 (+), and 
3.15 (V).? The difference in extinction coefficient (Ag; liters gram™ 
centimeter’) is the difference between the extinction coefficient at 
287 nm for ribonuclease at pH 7, 25 °C, and the samples at the 
noted pH and temperature. The changes in extinction coefficient 
over the ranges monitored were fully reversible. The samples were 
buffered with 40 mM glycine (pH 2.77 and 3.15) or by the protein 
itself (pH < 2.7). For each point on the curves, the solution was 
brought to the noted temperature (degrees Celsius), and the 
absorbance was tabulated after it no longer changed. The change 
in extinction coefficient is plotted as a function of the final temper- 
ature. Reprinted with permission from ref 53. Copyright 1967 
American Chemical Society. 
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between the native state and a denatured state can also 
be observed by differential scanning calorimetry,” pro- 
vided the rate of temperature increase is slow enough 
that equilibrium is reached at each temperature and the 
process remains reversible over the interval in which the 
region of transition is traversed.”””° 

If the two-state assumption is made, the fluores- 
cence of the solution (Fbs) observed at equilibrium at 
each concentration of guanidinium chloride in Figure 
13-1A is 

Fe = fr Foz + fu Fou (13-4) 
where bur is the intrinsic fluorescence that would be 
observed if all of the protein were fully folded, Foy is the 
intrinsic fluorescence that would be observed if all of the 
protein were fully unfolded, fr is the fraction of the pro- 
tein in the folded state, and fy is the fraction of the pro- 
tein in the unfolded state, all at the particular 
concentration of guanidinium chloride. Because Fy y and 
Fo, at each concentration of guanidinium chloride are 
known from the respective baselines and because fr + fu 
= 1.0 as a result of the two-state assumption, fr and fu at 
each concentration of guanidinium chloride can be cal- 
culated. From fr and fy, the equilibrium constant for fold- 
ing (Kra = fr/ fu) can be determined for that concentration 
of guanidinium chloride, temperature, and pH. The 
same analysis can be applied to the behavior of any other 
physical property that is directly proportional to the 
concentration of native protein and denatured protein, 
respectively, such as absorbance, optical rotation, circu- 
lar dichroism, or specific viscosity. 

If the two-state assumption is correct and signifi- 
cant concentrations of only the native state and the 
random coil are present at each concentration of guani- 
dinium chloride, then the situation is dramatically sim- 
plified. It is, however, reasonable that this should be the 
case. When a polypeptide folds from a random coil to 
form the native state under physiological conditions, it 
must pass through intermediate states between the 
random coil and the native state. If, however, these inter- 
mediate states were as stable or more stable than the 
folded state, there would be significant, measurable con- 
centrations of them at equilibrium, a possibility that has 
rarely been observed and that would be unfortunate for 
the protein in terms of both its function and its ability to 
avoid endopeptidolytic digestion. That these intermedi- 
ate states remain less stable than the native state as 
guanidinium chloride is added to the solution is not sur- 
prising, so long as they are about as compact as the native 
state. There are several observations which suggest that, 
in many cases, the two-state assumption is valid. 

In the region of transition, a point on the curve in 
Figure 13-1A should represent, if the two-state assump- 
tion is valid, an equilibrium mixture of only fully native 
protein and its random coil. The same equilibrium mix- 
ture forms when either the native state in the absence of 


guanidinium chloride (O) or the random coil in a con- 
centrated solution of guanidinium chloride (M) is trans- 
ferred into a solution at that pH and final concentration 
of guanidinium chloride. Either of these reactions is a 
special case of a general kinetic category referred to as an 
approach to equilibrium. 

The approach to equilibrium of either the unfolding 
or the folding polypeptide, respectively, should be gov- 
erned only by the two composite first-order rate con- 
stants kp and ky (Equation 13-1) if the two-state 
assumption is valid. Either the unfolded state or the 
folded state, respectively, should be the exclusive prod- 
uct formed in the two reactions as the equilibrium is 
established. The rate at which the concentration of either 
species, U or F, in Equation 13-1 changes is 


_a{u]_ dir] 
dt dt 


= kplU] - ky[E] (13-5) 


Because the concentration of total protein, [protein]ror, 
remains constant, it follows that 


[U] + [F] = [protein] ror = [Ul + DE 
(13-6) 


where [Uleq and [Fleq are the concentrations of native 
state and random coil at equilibrium. Combining 
Equations 13-5 and 13-6 and focusing on the concentra- 
tion that is decreasing, arbitrarily chosen to be the 
unfolded form for the following derivation, then 


d 
7 ne = (kp + ky)((U] - [U]eq) - ky [leq + kp [leq 


(13-7) 


At equilibrium no further changes occur in the concen- 
trations of either the native state or the random coil so 


ky [Ele = kr II (13-8) 
at all times, and, because [U]eq is a constant 
alu]  a([U] - Du, 
dr dr 
(kp + ky)([U] - Ill = Eer (IU] - [U]eq) 
(13-9) 


where Kuer is the observed rate constant for the 
approach to equilibrium during net folding. 

Equation 13-9 is a simple first-order differential 
equation in the variable ([U] - U].,) and describes a first 
order-process in this variable. Upon integration 


[U] - [U], 


[U] - [leg 


= exp (-Kopsrt) (13-10) 


where [U], is the concentration of unfolded form at the 
beginning of the approach to equilibrium. Because the 
process is symmetric (Equation 13-5), if unfolding were 
being monitored rather than folding, the variable would 
have been ([F] - [F].,) rather than ([U] — [U],,) but the 
observed rate constant for the approach to equilib- 
rium, Kops,u, would still be (kp + ky). 

Suppose one were to monitor any physical property 
Y, such as intrinsic fluorescence, absorbance, optical 
rotation, circular dichroism, or specific viscosity, that is 
directly proportional to the concentration of unfolded 
state and directly proportional to the concentration of 
folded state, respectively. The observed magnitude of 
that physical property for any solution containing a mix- 
ture of unfolded state and folded state at any time 


Yous = $y [U] + IF] (13-11) 


where Zu and Gr are the constants of proportionality 
between the concentration of each species and its 
respective contribution to the overall magnitude of the 
physical property for the mixture. When Equation 13-11 
is combined with Equation 13-6, 


Yobs — Y obs,eq = (ču = cr) [U] - (ču = cr) [Whig 
(13-12) 


where Yops,eg is the magnitude of that physical property at 
equilibrium. It follows from Equation 13-10 and 13-12 
that 


— Y 
obs,eq (13-13) 


= exp(-Kopsrt 
Yobs,0 ~ Yobs,eq ( ) 


Equation 13-13 states that the fraction of the total 
change that occurs in the magnitude of a particular phys- 
ical property monitoring the isomerization between 
unfolded state and folded state of a polypeptide should 
decrease exponentially as a function of time if during 
that isomerization only two states are significantly pop- 
ulated, namely, the unfolded state and the folded state. 
Again, because the process is symmetric, if the deriva- 
tion just presented had been based on [F] instead of [U] 
because unfolding was being monitored rather than fold- 
ing, the same outcome would have occurred, and 
because Kopsr = Kobsu = ku + kp, Equation 13-13 is the 
same whether net folding is progressing from an initially 
unfolded state or net unfolding is progressing from an 
initially folded state. 
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This derivation for the approach to equilibrium as 
monitored by a physical property is completely general 
and can be applied to any situation where the equilib- 
rium can be described by two first-order rate constants, 
forward and reverse. In the case of cold shock-like pro- 
tein (Figure 13-1A), cleanly first-order, exponential 
approaches to equilibrium were observed whether 
unfolded protein (m) was folded (ei or folded protein ( 
was unfolded (0).® An uncomplicated first-order 
approach to equilibrium is generally accepted as sup- 
port for a two-state assumption.” 

The observed rate constants for these approaches 
to equilibrium for cold shock-like protein (Figure 
13-1B)” have identical values at the same concentra- 
tions of guanidinium chloride whether the equilibrium 
was approached in the direction of folding (@) or in the 
direction of unfolding (O), another observation consis- 
tent with the two-state assumption. Furthermore, the 
observed rate constants for the approach to equilibrium 
decrease smoothly as the concentration of guanidinium 
chloride is increased until the region of transition is 
reached and increase smoothly with guanidinium chlo- 
ride beyond the region of transition, as expected if only 
one rate constant, kp, dominates in the former portion of 
the plot and another, ky, dominates in the latter. Such 
uncomplicated behavior of these observed rate con- 
stants is also presented as evidence for two-state behav- 
ior. In the region of transition, both kp and ky contribute 
to the observed rate constant for the approach to equi- 
librium kp + ky. 

It is the decrease in kp and the increase in ky 
(dashed lines in Figure 13-1B) with increasing concen- 
trations of guanidinium chloride that together shift the 
equilibrium constant into the measurable range. As the 
guanidinium chloride has its greatest effect on the stabil- 
ity of the unfolded state, it is not surprising that kp is 
affected more than ky. 

Often conditions are purposely chosen to ensure 
that the approaches to equilibrium of the folding reac- 
tion in the region of the transition between fully native 
and fully denatured protein are simple first-order 
processes. Under other conditions of temperature, pH, 
or concentration of denaturant, either folding or unfold- 
ing or both are not first-order processes and these condi- 
tions are avoided. For example, the equilibrium constant 
for folding of myoglobin at 25 °C is shifted into the meas- 
urable range at pH 4.2. At this pH both the folding and 
unfolding of the polypeptide are first-order processes.” 
They both proceed with clean isosbestic points in the 
Soret region of the visible spectrum, and this also indi- 
cates that both folding and unfolding at this pH are two- 
state processes. At higher or lower values of pH, however, 
the kinetics of both the folding and unfolding reactions 
become complex. 

If a point in the region of transition in Figure 13-1A 
represents a mixture containing only the native state and 
the random coil, then Equation 13-4, with appropriate 


— 
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substitutions, should describe the behavior of every phys- 
ical parameter measured as long as it is directly propor- 
tional to the concentrations of folded state and unfolded 
state. In this equation, the values for the fraction of native 
state, fr, and the fraction of random coil, fy, must be the 
same at a particular pH and at a given concentration of 
guanidinium chloride regardless of the physical property 
used to follow folding.’ A convincing demonstration of 
the coincidence of physical measurements was pre- 
sented for the thermal denaturation of bovine 
ribonuclease A (Figure 13-2). When the fraction of dena- 
tured ribonuclease A, fy, as a function of temperature at 
pH 2.1 was monitored by intrinsic viscosity, optical rota- 
tion, and ultraviolet absorption, all of these properties 
gave the same values for this quantity (Figure 13-3).°° If 
even one intermediate were present, the values ofall three 
of these physical properties for this intermediate would 
have to assume the same fractional deviation relative to 
the two extremes of each respective property. It seems 
unlikely that an intermediate could exist, for example, 
that had an intrinsic viscosity a third of the way between 
the intrinsic viscosities of the native protein and the 
random coil and an optical rotation also a third of the way 
between those of the native protein and the random coil 
and an ultraviolet absorption also a third of the way 
between those of the native protein and the random coil. 
In effect, this test is formally equivalent to the observa- 
tion of an isosbestic point in a series of spectra, and this 
observation has always been accepted as evidence for a 
transition between only two states. 

Intermediate states between the unfolded state and 
the native state of a polypeptide must be less stable than 
the native state at physiological temperature and pH and 
may remain so as the equilibrium is shifted towards the 
unfolded state. If, however, the perturbation does 
happen to destabilize these intermediate states less than 
it does the native state, they may become more stable 
than the native state before the denatured state domi- 
nates the equilibrium as the level of the perturbation is 
increased. If the criteria just discussed are routinely used 
to establish that such intermediate states are not present 
at significant concentrations and that the two-state 
assumption is valid, then these same criteria must also 
be able to detect the presence of such stable intermedi- 
ate states at equilibrium when they are revealed by the 
perturbation. It has already been noted that, in the fold- 
ing of myoglobin, the departure of the approach to 
equilibrium from simple first-order behavior indicated 
the presence of intermediate states.” In the case of phos- 
phoribosylanthranilate isomerase from Escherichia coli, 
both the kinetics of folding and the kinetics of unfolding 
at all concentrations of guanidinium chloride proceeded 
in two well-resolved first-order exponential phases 
rather than a single phase, an observation indicating that 
an intermediate must be present.” Furthermore, plots of 
the transition between the folded state and the unfolded 
state of phosphoribosylanthranilate isomerase differed 
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Figure 13-3: Change in the fraction of unfolded bovine ribonucle- 
ase A, fy, calculated by Equation 13-4, for a solution of the protein 
at pH 2.10 (Figure 13-2) as a function of temperature.” The varia- 
tion in three physical properties of a solution of ribonuclease A at 
pH 2.10 and ionic strength of 20 mM were measured as a function 
of temperature. These three properties were the absorbance at 
287 nm (A), the intrinsic viscosity (4), and the optical rotation, 
[O]365, at 365 nm (O). In each case, the direct observations were first 
plotted as a function of temperature as in Figure 13-2. The behav- 
ior of the physical property for the native state and the fully dena- 
tured state as a function of temperature was estimated by linear 
extrapolation as in Figure 13-1. The fraction of the denatured state 
fy was then calculated from each separate curve by Equation 13-4. 
The values of the calculated fraction of unfolded protein deter- 
mined by each physical property are presented together as a func- 
tion of temperature (degrees Celsius). To demonstrate that the 
process being followed was reversible, the absorbance at 287 nm 
was followed a second time (A) after a sample was heated to 40.8 °C 
for 16 h and then cooled. Reprinted with permission from ref 58. 
Copyright 1965 American Chemical Society. 


significantly in their shape and their dependence on the 
concentration of guanidinium chloride depending on 
whether the transition was monitored by circular dichro- 
ism at 278 nm, circular dichroism at 222 nm, or intrinsic 
fluorescence. It was concluded that at least one interme- 
diate state of this protein was present in the region of 
transition. 

A clear example of the existence of a stable inter- 
mediate state during the transition between the folded 
state and the unfolded state has been observed for 
bovine y-crystallin B (Figure 13-4A).° When the transi- 
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tion was monitored by circular dichroism, sedimentation lack of coincidence of the curves for different physical 
velocity, fluorescence emission at 320 nm, and fluores- properties.’ For example, when the shift in the equi- 
cence emission at 360 nm, the changes observed differed librium between the native state and the unfolded state 
dramatically. Each curve, however, showed an obvious of bovine a-lactalbumin (Figure 13-4B)®° was followed 


plateau at intermediate concentrations of urea, consis- 
tent with an almost completely populated intermediate 
state in this range. The kinetics of both folding and 
unfolding over the entire range of concentrations of urea 
displayed two distinct exponential phases with different 
rate constants during the approach to equilibrium, and 
these two sets of observed rate constants when plotted 
against the concentration of urea gave two separate, 
overlapping curves, each resembling the one in Figure 
13-1B. The plot for the observed rate constants of the iso- 
merization between the unfolded state and the interme- 
diate was displaced to higher concentrations of urea than 
the one for the isomerization between intermediate and 
the native state. Consequently, it was concluded that this 
unfolding promoted by urea is a three-state process with 
a discrete intermediate state that is almost completely 
populated at concentrations of urea between 3 and 4 M. 

In situations such as the one described, in which a 
discrete intermediate is thought to exist, the curves plot- 
ting the transition from native state to fully unfolded 
state as a function of the concentration of denaturant 
often show an obvious plateau or inflection,” and 
these curves can be fit by equations similar to Equation 
13-4 based on a three-state assumption to obtain the 
fraction of native state, intermediate state, and unfolded 
state at each concentration of denaturant. From these 
fractions, equilibrium constants for the isomerizations 
between these three states as a function of the concen- 
tration of denaturant can be calculated. In some situa- 
tions, four states—the native state, the unfolded state, 
and two discrete intermediates—can be detected in the 
inflections within the plots of the magnitude of physical 
properties as a function of the concentration of a denat- 
tant,“ 

Often, however, the curve for a particular physical 
property is a smooth function with no inflections, and 
the only indication that an intermediate is present is the 
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Figure 13-4: Evidence for the existence of stable intermediate states appearing during transitions between the native state and the random 
coil produced by increasing concentrations of urea or guanidinium chloride. (A) Shift of the equilibrium constant for the folding of bovine 
y-crystallin B.® Solutions of the protein (n,. = 174) in 0.1 M NaCl at pH 2.0 and 20 °C were diluted into solutions of urea at pH 2.0 and 20 °C 
so that the final concentration of urea (molar) would be as noted. Either the intrinsic fluorescence emission of the tryptophans of the pro- 
tein at 360 nm (@; excitation at 280 nm, final concentration = 0.04 mg mL"), the intrinsic fluorescence of its tryptophans at 320 nm (0; exci- 
tation at 280 nm, final concentration = 0.04 mg mL’), its sedimentation velocity (pw E; final concentration = 0.2 mg mL), or its circular 
dichroism at 222 nm (V; final concentration of protein = 0.1 mg mL!) were recorded as a function of the concentration of urea after each of 
the solutions was brought to equilibrium (24 h). Reprinted with permission from ref 60. Copyright 1990 held by authors. (B) Shift of the equi- 
librium constant for the folding of bovine o-lactalbumin.‘ Solutions of the protein (n,. = 123) at pH 6.7 and 25 °C were diluted into solutions 
of guanidinium chloride at pH 6.7 and 25 °C so that the final concentration of guanidinium chloride (molar) would be as noted. The molar 
ellipticities of these solutions of protein were measured at 296 nm (A), 270 nm (O), and 222 nm (el after each was brought to equilibrium. 
Reprinted with permission from ref 66. Copyright 1976 Academic Press. For the measurements presented in both panels, the direct results 
at equilibrium were plotted as in Figure 13-1 for each set of observations. Lines were drawn for the behavior of each physical property for 
fully native and fully unfolded protein as a function of the concentration of urea or guanidinium chloride, respectively, and the apparent frac- 
tional change of each measurement from the behavior of that physical property for the fully folded state were estimated from the positions 
of each data point relative to these lines. These estimated values of the fractional change are plotted in each panel as a function of the con- 
centration (molar) of urea or guanidinium chloride (GdmC). 
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by circular dichroism at 222nm (@), the transition 
observed did not coincide with the one measured by cir- 
cular dichroism at 270 and 296 nm (O, A). 

The shift in the equilibrium constant for the folding 
of bovine carbonate dehydratase has been followed by 
changes in circular dichroism at 269 nm, ultraviolet 
absorption at 290 nm, and optical rotation at 400 nm as 
a function of the concentration of guanidinium chloride 
at pH 7.0.” The circular dichroism smoothly traced one 
transition and the optical rotation smoothly traced 
another transition proceeding at a higher concentration 
of guanidinium chloride. The change in absorbance 
traced a curve between the other two that displayed an 
inflection, suggesting that it was able to monitor both 
transitions. Furthermore, the kinetics of the refolding of 
the random coil was not a homogeneous, first-order 
process. It was concluded that one or more stable con- 
formers of the polypeptide of carbonate dehydratase, 
other than the fully folded state and the random coil, are 
present in solutions of guanidinium chloride between 2 
and 3 M in concentration. The properties of these other 
states are distinct from those of either the native state or 
the random coil. From observations of different transi- 
tions with different physical properties, the fractions of 
native, intermediate, and unfolded states as a function of 
the concentration of denaturant can also be estimated,” 
even in situations where very little intermediate forms,® 
and equilibrium constants among the states can be cal- 
culated. 

If the protein being unfolded or refolded is an 
oligomer of two or more subunits, the respective disso- 
ciation or association of those subunits causes the tran- 
sition between folded state and unfolded state to depend 
on the concentration of protein in the solution. Suppose 
that the native protein is an œ dimer. The equilibrium 
between the unfolded state ay and the native state (a), 
is 


Zou == (&p)2 (13-14) 
and because 


[polypeptide] poy = 2[(@g)2| + [@y] (13-15) 


if there are only two states present, native dimer and 
unfolded monomer, at equilibrium 


8 l= fy 
2 fo [polypeptide] TO 


(13-16) 
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As the magnitude of the perturbation is increased, 
the equilibrium constant between folded and unfolded 
state is shifted in the direction of the unfolded state. At 
the midpoint of the transition between fully folded and 


fully unfolded states that is being monitored by a partic- 
ular physical property, fp = fy (Equation 13-4); fy = 12; and, 
instead of Kzq being equal to 1 at this point, as with a 
monomer, Era for a dimer is equal to [polypeptide] yor". 
Consequently, as the total concentration of protein is 
increased, the equilibrium constant between unfolded 
and folded states must be shifted more and more before 
the midpoint is reached, which requires a greater and 
greater perturbation. If there are only the two states, as 
the total concentration of protein is increased, the mid- 
points of the curves describing the transition between 
folded dimer and unfolded monomer move systemati- 
cally to higher and higher concentrations of guanidinium 
chloride®'”? or urea” or to higher temperatures.” 
Because it is only the molecularity of the reaction that 
causes these shifts in the curves with the concentration 
of protein, they are no longer observed when the dimer is 
artificially converted to a monomer by joining the car- 
boxy terminus of one of its subunits with the amino ter- 
minus of the other. "777 Even greater shifts with the 
concentration of protein are observed in the curves fol- 
lowing folding of higher oligomers” as a function of the 
perturbation. 

Often when the equilibrium between folded 
oligomer and unfolded monomer is shifted by a pertur- 
bation, stable intermediate states are formed. For exam- 
ple, during the increase in the perturbation, a dimer may 
dissociate to monomers before the polypeptides unfold, 
and if the physical property detects only unfolding, the 
curves following the transition between the folded state 
and the unfolded state will not shift as the concentration 
of protein is increased” because the formation of the 
monomeric intermediate goes undetected. Similarly, a 
tetramer can dissociate into dimers before the dimers 
unfold to monomers.” In some cases, the intermediate 
state is detected. In one such instance, the curves 
showed a plateau as in Figure 13-4A, but because both 
the native protein and the intermediate were dimers, it 
was only the portion of the curve monitoring the transi- 
tion from the dimeric intermediate to the unfolded 
monomer that shifted with concentration of protein.” 

When the protein binds a ligand, the addition of the 
ligand also causes the curves following the transition 
between the native state and the unfolded state to shift to 
higher levels of perturbation. Because only the folded 
protein can bind the ligand, L, if the ligand is present at 
saturation so that only liganded native protein is present 
at all concentrations of denaturant 


E 
U + L == FL (13-17) 
and 
F-L Fi 
: [FL] E (13-18) 
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Consequently, as the concentration of ligand is increased 
beyond its level of saturation, a greater perturbation is 
required to shift the equilibrium to a point at which ff = 
fu, and the curves move toward greater perturbation, for 
example to higher concentrations of guanidinium chlo- 
ride, as the concentration of ligand is increased. It is also 
the case that, at the same concentrations, a ligand with a 
smaller dissociation constant will shift the curve a 
greater distance than one with a larger dissociation con- 
stant.” 

The change in standard enthalpy of folding, AH "oa 
can be measured directly in a differential scanning 
calorimeter**“’ or it can be calculated from the depend- 
ence of the equilibrium constant of folding, Kpa, on tem- 
perature. From the van’t Hoff relationship 


dln Kou AH oa 


ee (13-19) 
ul? 


If a folding is followed as the temperature is varied and 
the value of the logarithm of the equilibrium constant for 
folding is plotted as a function of T”, the slope of the plot 
will be directly proportional to the change in standard 
enthalpy. When the folding of ß-lactoglobulin was made 
reversible and kinetically first-order in both directions by 
adding appropriate concentrations of urea and adjusting 
the pH to 3, the equilibrium constant for folding, Kpa 
could be measured at each concentration of urea for 
temperatures between 10 and 50°C. The behavior of 
log Kra as a function of T! (Figure 13-5A)”® demonstrates 
that the change in standard enthalpy for the reaction, 
Ara, is not constant but varies considerably with tem- 
perature. 

In fact, the values of the change in standard 
enthalpy for folding, AH°;4 (Figure 13-5B),” calculated 
from the slopes of this first plot, vary from exothermic to 
endothermic over the range of temperatures sampled, a 
fact suggesting that the change in standard enthalpy for 
folding is by itself uninformative. Furthermore, the 
change in standard enthalpy for folding of a series of 
mutants of the same protein is usually linearly related to 
the change in standard entropy (Equation 5-63) with a 
slope T, of about 350 K.’”® Although the slope is some- 
what greater than most other noncovalent processes, the 
compensation observed suggests that as with the 
hydrophobic effect both standard enthalpy and standard 
entropy are registering mainly compensatory changes in 
the water. 

When the change in standard enthalpy, AH°pa, is 
plotted against the temperature (Figure 13-5B), the slope 
of the relationship observed at each point is the standard 
heat capacity change of folding, AC°,, fa for that temper- 
ature 
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The change in standard heat capacity can also be meas- 
ured directly in a calorimeter®’ or by combining meas- 
urements of unfolding in solutions of urea and with 
temperature in a different manner.” 

From the observations presented in Figure 13-5B, it 
could be calculated that the change in standard heat 
capacity” for the folding of the polypeptide of 
B-lactoglobulin (n,a = 162) between 5.5 and 4.4 M urea, 
pH 2.5 and 3.2, and 15 and 50 °C is -8700 + 700 JK? mol”, 
or -54 J K! (mol of amino acid) '. The measured values 
for the changes in standard heat capacity for the folding 
of proteins composed of a single polypeptide and lacking 
cystines are -60 + 10 J K! (mol of amino acid)", regard- 
less of the perturbation used to shift the equilib- 
rium, 8798287 

Unlike the changes in standard entropy and stan- 
dard enthalpy that vary considerably from situation to 
situation, this uniform decrease in standard heat capac- 
ity seems to be a fundamental property of folding. It must 
arise from a combination of the decrease in heat capac- 
ity that occurs when hydrophobic amino acids are trans- 
ferred from the aqueous phase into the interior of the 
molecule of protein,® the increase in heat capacity that 
arises from the desolvation of polar amino acids when 
they are transferred into the interior,®’ and the decrease 
in conformational heat capacity that occurs when vibra- 
tions and rotations along the polypeptide become more 
hindered after it is folded. Of the three contributors, 
however, the difference in conformational heat capacity 
between the native state and the random coil may not be 
very significant because the observed heat capacity of an 
unfolded polypeptide is quite close to the heat capacity 
calculated only from the individual side chains and the 
individual peptide bonds composing that polypep- 
tide 59091 

The value for the change in standard heat capacity 
offolding is consistent with the decrease in standard heat 
capacity (-200 to -400 J K' mol") observed for the trans- 
fer of alkanes and arenes from water into an organic 
phase (Table 5-8) if it is recalled that hydrophobic amino 
acids make up only a fraction (30%) of the amino acids in 
a polypeptide and that many of them remain accessible 
to water in the native state after the protein has folded. 
These considerations suggest that the uniform decrease 
in standard heat capacity [-60 IK" (mol of amino acid)" 
associated with the folding of a polypeptide is one of the 
few signatures of the hydrophobic effect arising from the 
removal of hydrophobic amino acids from the solvent 
during their burial in the interior of the native state upon 
folding. The hydrophobic effect is the only noncovalent 
force that can provide a significant favorable contribu- 
tion to the standard free energy of folding. 

Although not always the case,” it has been pointed 
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Figure 13-5: Determination of AH "oa and AC°,rq for the folding of 
B-lactoglobulin in solutions of urea between pH 2.5 and 3.5.” A series 
of measurements of the optical rotation at 365 nm of solutions of 
B-lactoglobulin as a function of the concentration of urea were made 
at various temperatures. From the lines extrapolated from the pre- 
transition and posttransition regions of these smooth curves, the 
optical rotation of the fully native state and the fully unfolded state, 
respectively, of B-lactoglobulin at any particular temperature and at 
any concentration of urea were estimated. Solutions were then pre- 
pared at 4.48, 5.09, and 5.53 M urea, concentrations at which equilib- 
rium constants could be measured over the temperature range of 
10-50 °C. For each of these solutions, from the optical rotation at a 
given temperature and the estimated optical rotations of the native 
state and the unfolded state at that temperature and concentration of 
urea, the equilibrium constant for folding, Kpa, could be calculated 
(Equation 13-2 and 13-4). (A) Logarithms to the base 10 of Kpa 
(log Kra) plotted as a function of the inverse of the temperatures [1/T 
(Kelvin x 10°)]. The plots are curves, from the slopes of which the 
standard enthalpies of folding, AH°;g, can be calculated (Equation 
13-19) at any particular temperature. (B) Standard enthalpy of fold- 
ing, AH ra (kilojoules mole}, plotted as a function of the temperature 
(degrees Celsius). The slope of the line is the change in standard heat 
capacity, AC°, rq, for folding. Reprinted with permission from ref 78. 
Copyright 1968 American Chemical Society. 


out that there is a tendency for the characteristic change 
in standard heat capacity of folding to decrease in mag- 
nitude as the frequency of disulfides in the polypeptide 
increases.°*” This effect is thought to arise from a reduc- 
tion in the otherwise complete exposure of hydrophobic 
side chains to the water upon unfolding because of the 
inability of the cross-linked unfolded state to expand 
fully,” but the presence of cystines in the unfolded state 
must also decrease its conformational heat capacity. This 
decrease in the magnitude of the change in standard heat 
capacity may also result from an evolutionary compen- 
sation because the unfolded state, losing standard con- 
figurational entropy by the introduction of the disulfides, 
requires less of a contribution from the hydrophobic 
effect to achieve the proper standard free energy of fold- 
ing. 

The fact that the changes in standard heat capacity 
of folding for all proteins are significant negative num- 
bers dictates that the change in standard enthalpy of 
folding, AH°;4, must decrease significantly as the tem- 
perature is raised (Figure 13-5B) and must pass through 
a value of zero at some temperature. This causes the 
equilibrium constant for folding to pass through a maxi- 
mum at that same temperature (Figure 13-5A). From 
these considerations it follows that ifthe negative change 
in standard heat capacity is an intrinsic property of fold- 
ing, each protein must have a characteristic tempera- 
ture of maximum stability.” These temperatures of 
maximum stability vary from less than 0 °C to more than 
35 °C. In the presence of moderate concentrations of 
urea” or guanidinium chloride™ or at high pressure,” a 
protein that is fully folded at room temperature will often 
unfold as the temperature is decreased to 0 °C or below. 

It is also possible to shift the equilibrium constant 
for folding in the direction of denaturation by applying 
pressure to a solution of protein.”°” This observation 


requires that the solvated denatured state have a smaller 
volume than the solvated native state because 


= 


=AV°xq = Vp - V (13-21) 
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where AG°ra is the standard free energy of folding 
(-RT In Kya), AV°ra is the standard volume change of 
folding, V; is the molar volume of the folded state, and Vy 
is the molar volume of the unfolded state. The volume 
changes for folding calculated from these results are pos- 
itive as expected. At pH 2.0 and 0 °C the volume change 
for folding of bovine ribonuclease A” is +48 cm? mol" 
and that for bovine chymotrypsinogen” is +14 cm? mol”, 
while that for myoglobin” from pH 4 to 6 at 20°C is 
+92 +5 cm’ mol”. 

The changes in isoentropic compressibility of 


folding 
Jl Ia (13-22) 
Vz\ OP Js Vy\ ap 


are more informative. The isoentropic changes in com- 
pressibility of folding for ribonuclease A and chy- 
motrypsinogen”””® are both about -0.015 GPa™. The 
negative values for these isoentropic compressibilities 
indicate that the solvated denatured state is more com- 
pressible than the native state. This is not a surprising 
result because the isoentropic compressibilities of native 
proteins are very small, 10-fold smaller than those of 
organic liquids and 2-fold smaller than those of amor- 
phous organic solids.'” The greater compressibility of 
the denatured state is probably due in part to its more 
fluid structure, but it is also possible that the hydropho- 
bic functional groups revealed in the denatured state 
increase the structure of the water surrounding them and 
thereby increase its compressibility. This increase in the 
structure of water, if it is significant, would resemble the 
increase in its structure caused by decreasing the tem- 
perature, and decreasing the temperature of liquid water 
increases its compressibility (Figure 5-5). 

High pressures also are able to dissociate multi- 
meric proteins into monomers, reversibly, without caus- 
ing denaturation, even at neutral pH. The volume change 
is small; in the case of enolase at 10°C and pH 7.4, 
AV° = 0.025 cm? (mol of amino acid)". Presumably the 
individual volume changes occur only at the faces of the 
subunits that are exposed during the dissociation.’ 

To be able to calculate the equilibrium constant Kpa 
for folding from the measured concentrations of the two 
states of the protein, it must be decreased significantly by 
one or a combination of rather unphysiological pertur- 
bations such as increasing the temperature or pressure, 
lowering the pH, or adding guanidinium chloride or urea 
to the solution. It would be of interest to be able to esti- 


Aks fa =- 
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mate the value at pH 7 and 25 °C of this equilibrium con- 
stant in the absence of any perturbation. This is gener- 
ally accomplished by extrapolation” from realms of pH, 
temperature, pressure, and concentrations of urea and 
guanidinium chloride where measurements can be 
made. 

Various equations have been derived for extrapolat- 
ing values of the equilibrium constant Kr4 to small or zero 
concentrations of urea and guanidinium chloride”? 
and from acidic to neutral pt. II Most of these equations 
plot the observed standard free energies of folding, 
AG°rq, as functions of the magnitude of the perturbation 
to perform the extrapolations. It is also possible to per- 
form nonlinear least-squares fits of empirical equations 
to plots of the directly observed changes of a physical 
property as a function of denaturant.'”’ Unfortunately, 
each theoretical curve, although it is successful at repro- 
ducing the behavior in the measurable regions, deviates 
from the other theoretical curves beyond the measurable 
regions. The values for the standard free energy of fold- 
ing measured both at elevated temperatures and in the 
presence of urea can be extrapolated simultaneously to 
obtain an estimate of the value for standard free energy 
of folding at 25 °C in the absence of urea.” Extrapolation 
both from high concentrations of guanidinium chloride 
and from low pH can also be performed simultane- 
ously.’ It is also possible to measure the thermal 
unfolding of a protein in a differential scanning 
calorimeter at a series of concentrations of urea below 
the range of concentrations at which the transition is 
observed at 25 °C. 

The extrapolation that has become most widely 
accepted is one for values of standard free energies of 
folding, AG°zg, observed in solutions of guanidinium 
chloride or urea, and the equation for performing this 
extrapolation that has emerged as the most popular is!” 


AG um = AG rn: MIDI (13-23) 


where ACGram is the standard free energy of folding cal- 
culated (Equation 5-14) from the observed equilibrium 
constant at a given concentration of the denaturant; 
AG*raH,0 is the standard free energy of folding for the 
protein in aqueous solution at the same pH, ionic 
strength, and temperature; and m is the slope of a line 
that is fit to the observations. This equation states that 
the standard free energy of folding is a simple linear 
function of the concentration of denaturant, which 
seems to ignore the observation that the changes in sol- 
vation brought about by guanidinium chloride and urea 
(Table 13-1) are not directly proportional to their molar 
concentrations.*'?°>10%108 

Nevertheless, there are many observations support- 
ing the validity of this relationship. In situations where 
there are wide ranges ofthe concentration of denaturant 
over which the equilibrium constant for folding can be 
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measured accurately (Figure 13-6),'"”!° the standard free 
energies of folding do in fact vary linearly with the con- 
centration of denaturant over the entire range of meas- 
urements.* Most of the time, however, the range of 
concentrations of denaturant over which measurements 
of the equilibrium constants for folding can be made is 
much narrower (Figure 13-7). Extrapolations of stan- 
dard free energies of folding perturbed by different denat- 
urants give the same value for AG°fa 4,0, in spite of the long 
distances over which those extrapolations must be 
made TJ Standard free energies of folding beyond the 
range of denaturant concentrations in which they can be 
directly measured can be estimated from measurements 
of unfolding induced by raising the temperature in a dif- 
ferential scanning calorimeter, and these estimates usu- 
ally fall close to the line of extrapolation.''' Measurements 
of the first-order rate constants for the approach to equi- 
librium can be made for the entire range of concentrations 
of denaturant, and from a plot of these observed rate con- 
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Figure 13-6: Estimation of the standard free energy of folding in 
the absence of denaturant by extrapolation. Cold shock protein 
CspB (naa = 67) from Bacillus subtilis was dissolved in a series of 
solutions of increasing molar concentrations of urea at pH 7 and 
25 °C. Two final concentrations of protein, 1.35 uM and 13.5 uM, 
were used. The emission of fluorescence of each solution, F342, was 
monitored at 342 nm upon excitation at 280 nm (inset). After equi- 
librium was reached, the equilibrium constant for folding was esti- 
mated for each concentration of urea, and from these equilibrium 
constants, the respective standard free energies of folding 
AG° rq, {urea} Were Calculated. These standard free energies of folding 
(kilojoules mole”) are plotted as a function of the concentration of 
urea (molar). A line was fit to the data by linear least-squares analy- 
sis. The dashed line at zero (equilibrium constant of 1) emphasizes 
that direct measurements of the equilibrium constant can usually 
be made only over a limited range. Reprinted with permission from 
ref 109. Copyright 1995 Nature Publishing Group. 


* The ionic strength of the solution must be maintained as the con- 
centration of guanidinium chloride is decreased to retain linear 
behavior of AG*ra, [GdmCl] a 


stants, values for kp and ky in the absence of denaturant 
can be estimated by extrapolation (dashed lines in Figure 
13-1B). If the folding is a two-state process, the equilib- 
rium constant for folding calculated from the estimates of 
these two rate constants (Equation 13-2) gives a value for 
the standard free energy of folding that agrees”? with 


that obtained by use of Equation 13-23. 
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Figure 13-7: Estimation of the standard free energy of folding by 
extrapolating standard free energies of folding observed in solu- 
tions of different denaturants.!” (A) Shifts of the equilibrium con- 
stants of folding. Bovine chymotrypsin (naa = 241), which had been 
sulfonylated with phenylmethanesulfonyl fluoride to inactivate the 
endopeptidase, was dissolved in solutions of different concentra- 
tions of urea (O), 1,3-dimethylurea (A), and guanidinium chlo- 
ride (O) at pH 4.0 and 25 °C. The several folding isomerizations 
were monitored by the change in extinction coefficient at 293 nm 
(A£; millimolar‘ centimeter’), which is plotted as a function of 
the concentration (molar) of urea. In the respective regions of tran- 
sition, the fraction of the protein in the folded state and the fraction 
in the unfolded state were calculated from the distance of each data 
point from the values of the change in absorbance for the fully 
folded (upper dashed lines) and the fully unfolded (lower dashed 
lines) states. Equilibrium constants for folding were calculated 
from these fractions for each concentration of denaturant, and 
from each of these equilibrium constants, standard free energies of 
folding AG°ra,ın; were calculated at the respective concentrations of 
denaturant. (B) Standard free energies of folding (kilojoules mole”) 
plotted against the respective concentrations (molar) of each 
denaturant. Each of the lines was fit to the respective set of data by 
linear least-squares analysis. Reprinted with permission from 
ref 105. Copyright 1988 American Chemical Society. 


That the numerical value for Ara nn, obtained by 
use of Equation 13-23 is a reasonable estimate of its 
actual value can also be demonstrated by evaluating the 
effect of pH on its magnitude. Acd base titration curves 
for the folded state of the protein and its unfolded state 
in 8 M urea or 6 M guanidinium chloride can be meas- 
ured directly” or calculated from its composition of 
amino acids,” and an integrated form of Equation 13-3°° 
can be used to calculate the variation expected in 
At ran caused by changes in pH. These calculated vari- 
ations reproduced the observed variations with pH of 
estimates of AG°ran,o obtained by the extrapolation 
defined by Equation 13-23 for bovine ribonuclease A” 
and bovine chymotrypsin” when they were unfolded in 
solutions of urea and guanidinium chloride. 

The observed changes in the dependence of 
AG*raH,o on PH caused by site-directed mutation of a 
particular amino acid in the protein also agree quantita- 
tively with those calculated with integrated forms of 
Equation 13-3. The observed values of AG°rq4,0 between 
pH 5 and 8 for the carboxy-terminal domain of protein L9 
from the 50S subunit of the ribosome of E. coli fell on the 
curve calculated with an integrated form of Equation 
13-3 by use of the values of pK, for its four histidines in 
the native state as determined directly by nuclear mag- 
netic resonance.” When each of these histidines was 
mutated in turn, the observed changes in the behavior of 
AG*raH,o Were again those predicted from their individ- 
ual values of pK}. In particular, the mutation of Histidine 
134, which is buried in the interior and has the lowest pK, 
of the histidines in the native protein, caused the great- 
est change in the observed pH dependence of Aan. 
Aspartate 26 is buried in the native state of thioredoxin 
from E. coli, which causes its pK, to be 7.5, a fact that 
destabilizes the folded state relative to the unfolded 
(Equation 13-3). The destabilization calculated from the 
difference in the values of pK, for just Aspartate 26 is 
equal to the difference in the values of AG°fa n,o esti- 
mated by Equation 13-23 for the wild-type protein and a 
mutant in which Aspartate 26 is replaced by alanine.”’ 
Differences in AG?ra n,o calculated from observed shifts 
in the values of pK, for histidines in ribonuclease T, from 
A. oryzae caused by mutation of Glutamate 58 to alanine 
also agreed with differences in values of AG°pq4,0 esti- 
mated with the extrapolation of Equation 13-23 1 

The constant fragment C, of the light chain of 
immunoglobulin G (Figure 11-1) is a small protein, the 
folding of which as a function of the concentration of 
guanidinium chloride has been measured." The protein 
contains a deeply buried cystine that is readily reduced 
by dithiothreitol when it is unfolded in solutions of 
guanidinium chloride. The rate of its reduction in the 
random coil in the absence of guanidinium chloride can 
be estimated by extrapolation of its rate of reduction at 
higher concentrations of guanidinium chloride. The 
actual rate ofits reduction in the absence of guanidinium 
chloride when the protein is folded is much slower than 
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the extrapolated value. If it is assumed that the reduction 
of this cystine in the absence of guanidinium chloride 
occurs only when the native protein is briefly and 
reversibly a random coil, the difference between the rate 
of reduction for the native state and that estimated for 
the random coil is consistent with a value for the stan- 
dard free energy of folding of -26 kJ mot. This is close to 
the value (-30 kJ mol) obtained by extrapolation from 
ranges of guanidinium chloride concentrations in which 
the equilibrium constant can be measured. 

Measurements of proton exchange also support an 
extrapolation of standard free energy of folding that is 
linear in the concentration of denaturant. The amido 
protons of peptide bonds buried deeply in the interior of 
a protein, when they exchange at the EX, limit (Equation 
12-63), often register a conformational change that is the 
global unfolding and folding of the protein.’ 
Consequently, in these situations, Kant (Equation 
12-63)* is actually Kpg”. When the standard free energy 
of this conformational change revealed by proton 
exchange, AG°;x, is monitored, it is found to be a linear 
function of the concentration of the denaturant (Figure 
13-8).1617” In the case of cysteineless type I 
ribonucleaseH from E.coli, only Methionine 47 is 
buried deeply enough to respond only to the global 
unfolding and folding over the range of rates that could 
be measured, but the change in standard free energy of 
the conformational change that it monitors remains a 
linear function of the concentration of guanidinium 
chloride to concentrations well below those at which 
global folding can be monitored directly (inset in Figure 
13-8). The range of values for proton exchange that can 
be measured for deeply buried amido hydrogens can be 
extended by raising the temperature, and at higher tem- 
perature, they remain linear functions of the concentra- 
tion of guanidinium chloride until none is left in the 
solution.'’® Furthermore, the standard free energies of 
folding in water, AG°;q 4,0, estimated from these plots of 
AG°yx as a function of the concentration of denaturant, 
agree satisfactorily with values of standard free energies 
of folding in water estimated from linear extrapolations 
of standard free energies of folding calculated from 
direct measurements of the equilibrium constant for 
folding at higher concentrations of denaturants (inset to 
Figure 13-8)" 

There are several observations, however, suggest- 
ing that the standard free energy of folding of at least 
some proteins that fold in a two-state process may not be 
a linear function of the concentration of denaturant all 


* By convention, the equilibrium constant for the conformational 
change producing exchange is defined for the opening of the 
structure, while the equilibrium constant for folding is defined 
reciprocally, namely, for the closing of the structure. Consequently, 
Kea’ = Kont and Aug = —AGyx when the conformational change 
being monitored by the exchange is the global unfolding and 
folding of the protein. 
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Figure 13-8: Standard free energies of unfolding estimated from 
the rates of exchange for particular amido protons in cysteineless 
type 1 ribonuclease H from Escherichia coli.'" All three of the cys- 
teines in ribonuclease H were mutated to alanines. The resulting 
cysteineless protein was dissolved at a concentration of 1 mM ina 
series of solutions of deuterated guanidinium chloride prepared in 
deuterium oxide at p*H 5.1 and 25 °C. The rates at which 53 of the 
amido protons in the polypeptide backbone of the protein 
exchanged with deuterons from the solvent were slow enough to be 
monitored by two-dimensional nuclear magnetic resonance spec- 
troscopy (Figure 12-33). Observed rate constants were calculated 
by fitting the amplitudes of the peaks as a function of time to single 
exponential decays by nonlinear least-squares analysis. Each 
proton exchanged in the EX, limit, and from the observed rate con- 
stant of its exchange, an equilibrium constant K.onr for the confor- 
mational change that exposed it to solvent was calculated 
(Equation 12-63). It was assumed that for the most deeply buried 
positions showing the smallest values of K.on the value of Kant 
would actually be the equilibrium constant for global unfolding 
(the reciprocal of the equilibrium constant for folding). It was con- 
cluded that the exchange rates of Methionine 47 (m), Glutamine 
105 (A), and Alanine 110 (@) were monitoring this global unfolding 
equilibrium of the protein. From the values of K..nr for each of these 
three exchanges, the respective free energies for the conforma- 
tional change exposing the protons for exchange, AG°y, were cal- 
culated. These standard free energies of exposure (kilojoules 
mole”) are plotted as a function of the concentration (molar) of 
guanidinium chloride (GdmCl). The inset presents the fraction 
of the protein that is folded (fr) as a function of the concentration 
of guanidinium chloride (molar), as monitored by circular dichro- 
ism at 220 nm. Reprinted with permission from ref 117. Copyright 
1996 Nature Publishing Group. 


the way to the point at which none remains. Values for 
the changes in standard heat capacity of folding 
(AC°„ra), the standard enthalpies of folding (AH "el, and 
the temperatures at which the concentrations of folded 
and unfolded states are equal (the melting temperatures, 
Tm) were measured for ribonuclease from Bacillus amy- 
loliquifaciens?® at concentrations of urea below those at 
which the equilibrium constant for folding could be 
measured at 25 °C. From these thermodynamic parame- 
ters, the standard free energy of folding at 25 °C and at 


each of these concentrations of urea could be calculated 
with the relationship 


AC = AH dm + AC° „ra (T = T) = 


o T o T 
AH sam Ta = TAC’ pa In T, 
(13-24) 


where Tis the temperature (kelvins) for which these cal- 
culations are to be made (in this case, 298 K) and AH ram 
is the standard enthalpy of folding at Tm, which can be 
measured in the calorimeter. The change in standard 
heat capacity of folding AC", ra was estimated from the 
temperature dependence of AH°;g (Equation 13-20). 
When these calculated values of AC "ra went for concen- 
trations of urea below those for the region of transition at 
25 °C were plotted, they deviated from the line fit to the 
values of AG°ra jureay measured directly within the region 
of transition at 25 °C. 

The deviation was a gradual curvature sending the 
plot of the actual standard free energy of folding below 
the linear extrapolation. A similar downward curvature 
has been directly observed for the standard free energy of 
folding for helical peptides as a function of the molar 
concentration of urea in the range from 4M down to 
0 MII If these thermodynamic calculations and direct 
observations are correct and applicable to folding in gen- 
eral, then the linear extrapolations normally performed 
consistently underestimate the magnitudes of the stan- 
dard free energy of folding at 25 °C in the absence of 
denaturant. In the case of ribonuclease from B. amy- 
loliquifaciens, the actual value for the standard free 
energy of folding would be about 15% more negative 
than that obtained by extrapolation,° but a somewhat 
larger underestimate has been reported for the magni- 
tude of the standard free energy of folding for lysozyme 
from G. gallus by a similar approach IT A drawback of the 
heavy reliance on measurements of thermal unfolding in 
these experiments, however, is that the thermally dena- 
tured state of a protein is not a random coil. 

Another approach, which is somewhat simpler to 
accomplish than extrapolating a set of measurements at 
different concentrations of denaturant, is widely used 
when the standard free energies of folding for mutants 
are compared to the standard free energy of folding of 
the unmutated protein.'”"!” The equilibrium constant 
for folding is measured over the region of transition at 
higher temperatures in which the concentration of dena- 
tured state becomes significant (Figure 13-2). With the 
van’t Hoff relationship (Equation 13-19), the standard 
enthalpies of folding are estimated for both wild type and 
the mutants within this range of temperatures. The melt- 
ing temperatures T,, for wild type and mutants are the 
temperatures at the midpoint of the thermal transition; 
the change in standard heat capacity of folding AC° za is 


assumed to be the same for wild type and mutants, a 
value that has been either directly measured or esti- 
mated from 60J K! (mol aa)"; and the differences in 
standard free energy of folding, AAG°r4, between mutants 
and wild type at a particular temperature within the 
range of the measurements are calculated with Equation 
13-24. 

One might assume that the relative stabilities of a 
series of mutants of a protein could be estimated from 
the concentrations of guanidinium chloride required to 
shift the equilibrium constants for their folding to a value 
of 1. This is not the case, however, because mutants 
requiring greater concentrations of guanidinium chlo- 
ride to shift their equilibrium constants can have less 
negative standard free energies of folding in the absence 
of guanidinium chloride.” 

Representative values for extrapolated standard 
free energy changes of folding* under physiologically rel- 
evant conditions in the absence of denaturants at 25 °C 
have been assembled in Table 13-2. Each of the several 
values for a particular protein is the result of a different 
extrapolation, often from the same experimental data. 
The remarkable feature of this tabulation is that the stan- 
dard free energies of folding for at least these 12 proteins 
are similar and fall between -20 and -60 kJ mol”. These 
are not large changes of standard free energy when the 
magnitude and the number of the individual noncova- 
lent interactions involved in the process are considered. 
They are clearly the sums of a large number of positive 
and negative terms that cancel each other to produce 
small negative numbers. That they are all negative is 
merely the result of evolution by natural selection and 
consequently uninformative. The small magnitudes of 
these values may also be a consequence of evolution by 
natural selection. It is possible, if one is lucky, to increase 
the stability of proteins by site-directed mutation,” and 
proteins from hyperthermophilic organisms are gener- 
ally more stable than those from organisms adapted to 
normally encountered temperatures” so the opportu- 
nity to evolve more stable proteins must exist, but it is 
usually not exploited. 

With these values for the standard free energy 
changes (Table 13-2), the values for the equilibrium con- 
stants for the folding of these native proteins at 25 °C 
should be between 10‘ and 10”. If this is the case, 10™ to 
10™ of the lifetime of each of these proteins is spent in 
the fully unfolded state under normal circumstances. 


* The standard state for free energy of folding has not been well 
defined. Because most foldings that have been studied are 
intramolecular isomerizations, their equilibrium constants should 
be independent of their concentration; and either infinite dilution 
or a corrected volume fraction of 1 (Equations 5-13 and 5-14) could 
be chosen as the standard state with little effect on the values for 
the standard free energies. The foldings of oligomers, however, are 
concentration-dependent and require that a finite concentration 
be chosen for standard state. 
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Table 13-2: Standard Free Energies of Folding in the 
Absence of Guanidinium Chloride or Urea, AG rann, at 
25 °C 


perturbation AG rann 
protein pH extrapolated (kJ mol) 
ribonuclease A 6.0 Gdmcl* -50 
Bos taurus 6.0 Gdmcl* 40 
6.6 urea!3:107 -30 
pp ` Gdmcl'!” -40 
lysozyme 6.0 Gdmcl'"™4 -60 
G. gallus 6.0 Gdmcl” -50 
70  Gdmcl, pH’ -50 
70  Gdmcl, pH’ -80 
a-chymotrypsin 4.3 urea!3107 -30 
B. taurus 4.3 Gdmel! 7 -30 
4.3. Gdmcl'® -50 
4.3. Gdmcl'” -40 
[(phenylmethyl)sulfonyl]- 4.0 Gdmcl'® 40 
a-chymotrypsin 4.0 urea!” -40 
B. taurus 4.0 1,3-dimethylurea!® 40 
6.0 urea, GdmCl* -50 
myoglobin 7.0 GdmCl, pH!® -50 
Equus caballus 6.0 Gdmcl'* 40 
6.0 Gdmcl'”® -50 
cytochrome c 6.5 Gdmcl’”’ -50 
E. caballus 
ribonuclease T; 7.0 urea!" -30 
A. oryzae 
ribonuclease 6.3 urea% -40 
B. amyloliquifaciens 
chymotrypsin inhibitor 2 6.0 Gdmcl'”® -30 
Hordeum vulgare 
phosphocarrier protein HPr 6.0 ureal!® -20 
E. coli 
ribonuclease H 6.0 Gdmcl'”® -60 
Thermus thermophilus 
thioredoxin 7.0 Gdmcl'® -25 
E. coli 


The protection factors for the proton exchange of deeply 
buried amide hydrogens are usually 10° or greater, and 
these most deeply buried amido protons are probably 
exchanged only during the brief periods when the native 
state has become reversibly and completely unfolded. 
Natural selection has settled on an equilibrium constant 
large enough to limit the time the protein spends in the 
unfolded state in part to protect it from degradation by 
endopeptidolytic enzymes.'” 

The requirement that a protein must unfold and 
refold during its lifetime may be viewed as a conse- 
quence of the need to fold in the first place (Equation 
13-1) and the inescapable dictates of microscopic 
reversibility. If the observed rate constant kp for the 
spontaneous refolding of a recently unfolded polypep- 


678 Folding and Assembly 


tide”! is on the order of 10 s™ at 25 °C and the equilib- 
rium constant Kra for folding is on the order of 10°, then 
the observed rate constant for the unfolding of a native 
protein to the random coil (ky = kp/Kpg) must be on the 
order of 10” s™. This would state that a protein has a 50% 
chance of unfolding to the random coil every 100 days at 
25 °C. This is not a major problem in the life of a protein. 

Measurements of the equilibrium constants Kant 
for the conformational changes permitting the exchange 
of amido protons at the EX, limit (Equation 12-63) are 
consistent with the proposal that the most deeply buried 
positions in the polypeptide backbone exchange only 
upon its complete unfolding (Figure 13-8). The confor- 
mational equilibrium constants governing the exchange 
rates for the less deeply buried amido protons, however, 
are larger than the one for the most deeply buried posi- 
tions and are spread over a range of values." The 
larger values for these other conformational equilibrium 
constants, which produce faster rates of exchange, are 
the result of conformational changes confined only to 
portions of structure of the protein, for example to indi- 
vidual o helices or loops of random meander, 116132136,137 
rather than the result of the fundamental unfolding 
encompassing the entire structure. 

These local conformational changes appear to be 
of two types: those involving considerable exposure of 
nonpolar functional groups to the solvent, similar to the 
exposure experienced during global unfolding, and those 
involving exposure of the amido protons to be exchanged 
without any significant expansion of the local structure 
into the solvent.'” The former are recognized by the 
increase in their equilibrium constants produced by 
adding guanidinium chloride or urea; the latter, by the 
insensitivity of their equilibrium constants to the addi- 
tion of these denaturants.'"613*13” For example, above 
0.8 M guanidinium chloride, the œ amido protons on 
Glutamine 105 and Alanine 110 of cysteineless type I 
ribonuclease H (Figure 13-8) must exchange during a 
major unfolding of the protein because the standard free 
energy for the conformational change permitting their 
exchange decreases significantly as the concentration of 
guanidinium chloride increases. At lower concentrations 
of guanidinium chloride, however, their respective rates 
of exchange are governed by other conformational 
changes the equilibrium constants for which are unaf- 
fected by the concentration of guanidinium chloride. 
These other conformational changes, therefore, must 
not involve significant increases in the exposure of the 
polypeptide to the solvent. Because they are unaffected 
by the concentration of guanidinium chloride, the equi- 
librium constants for these other conformational 
changes become larger than the equilibrium constant for 
the major unfolding at low concentrations of the denat- 
urant. 

These observations indicate that in solution, in its 
native state, the structure of a protein is constantly fluc- 
tuating as a result of conformational changes of various 


extents, occurring in different locations, some involving 
isomerizations retaining compact globular structure, 
others involving large, rapid expansions into the solvent 
followed by a collapse back into the native state.'” 

Because a polypeptide can fold in the first place and 
because it must refold in part or in its entirety during the 
span of its life, the information dictating the final native 
state of the protein must be contained within its amino 
acid sequence. Because the standard free energy of fold- 
ing of most proteins is not a large negative number 
(Table 13-2), perhaps for cause, if some of the informa- 
tion is lost or misinformation is added, the protein will 
not fold. For example, whenever the sequence of a pro- 
tein is changed by site-directed mutation, the possibility 
exists that the mutant will not fold, for reasons that will 
never be learned. Many site-directed mutations, how- 
ever, have little effect on the ability of the protein to fold, 
and in a few instances, a site-directed mutation has been 
found to increase the stability of a protein. For example, 
when the amino acids at positions 40-49 of lysozyme 
from bacteriophage T4 were all replaced with alanines, '“° 
its standard free energy of folding increased by only 10 kJ 
mol, while the appropriate replacement of five of the 
amino acids in type I ribonucleaseH from E coli” 
decreased its standard free energy of folding by 20 kJ 
mol”. 

Incomplete polypeptides often lack sufficient 
information to fold properly. A form of the polypeptide 
of bovine ribonuclease A (n,a = 124) that is missing the 
last six amino acids is unable to produce a folded pro- 
tein with enzymatic activity, and what structure it does 
have at 20 °C is eliminated by heating to only 40 °C at 
pH 7.5 in the absence of denaturants.'*’ This truncated 
polypeptide is also susceptible to endopeptidolytic 
degradation, unlike the intact native protein. When the 
last 23 amino acids, which form only a small number of 
contacts with the bulk of the folded polypeptide in its 
crystallographic molecular model, are removed from the 
polypeptide of micrococcal nuclease (Nna = 149), the 
polypeptide produced is a random coil by the criteria of 
circular dichroism, optical rotation, and ultraviolet 
absorption.” It is also readily digested by trypsin, 
unlike the native enzyme. Its residual enzymatic activity 
of 0.1%, which is an intrinsic property of the shortened 
polypeptide,’ suggests that it can still fold properly to 
form an active enzyme but that the equilibrium con- 
stant for folding is displaced heavily (Kpa < 10°) in the 
direction of the random coil. When the first 12 amino 
acids and the last 9 amino acids are removed from the 
protein, it folds partially to form a state in which some of 
its normal secondary structures are formed but in low 
yield. 4418 

Another set of examples of the fact that a polypep- 
tide can fold only when all the necessary information is 
present is proteins that are posttranslationally modified 
during their natural maturation. In many instances, the 
polypeptide that folds to produce the native state is 


longer than the final product because the initial folded 
form is clipped, and the smaller piece or pieces resulting 
from the posttranslational clipping of the polypeptide 
dissociate.'“° For example, subtilisin E from Bacillus sub- 
tilis folds naturally when it is a polypeptide 352 amino 
acids in length. After it folds, it is posttranslationally 
modified. During this process, the peptide bond after 
Tyrosine 77 is cleaved, and the first 77 amino acids of the 
polypeptide, the prosequence, are lost. If the mature, 
enzymatically active form of the protein (naa = 275) is 
unfolded, it will not refold; but if the full-length polypep- 
tide (naa = 352) is unfolded in 6 M guanidinium chloride, 
it readily refolds to produce the native state.'*’ If the 
prosequence (naa = 77) is included in the solution when 
the mature protein is being refolded, considerable native 
state is recovered." The yield of enzymatic activity is low 
but increases as the concentration of prosequence is 
increased up to a molar excess of 4-fold.'*” Only when the 
complete amino acid sequence of the longer polypeptide 
is intact, however, is there sufficient information to pro- 
duce a high yield of the mature form. Once folded and 
posttranslationally modified, the mature protein is stable 
and biologically competent, as long as it is not unfolded. 
In the case of carboxypeptidase C from Saccharomyces 
cerevisiae, however, the mature posttranslationally mod- 
ified form of the protein (n,a = 421) is considerably more 
resistant to the effects of guanidinium chloride than is its 
intact precursor (Naa = 512), a result suggesting that the 
prosequence provides information rather than standard 
free energy for folding.” 

There are many other examples of proteins that lose 
portions of their polypeptide, usually from the amino ter- 
minus, after they have folded. This is so common that the 
term proprotein is used to designate the longer polypep- 
tide that folds, with the implication that the cleaved, 
mature native state is designated as the protein. Familiar 
examples of this designation are proinsulin, proalbumin, 
and prothrombin. 

Fragments of a polypeptide, each lacking sufficient 
information to fold separately, can sometimes cooperate 
to produce the proper native state. The first example of 
this was the ability of the amino-terminal fragment of 
ribonuclease (Lysine 1-Alanine 20), which is almost 
structureless in isolation,'”! to reassume its native struc- 
ture as an o helix when combined with the remainder of 
the polypeptide (Serine 21-Valine 124).'°* Both the frag- 
ment Alanine 1-Arginine 126 and the fragment Glycine 
49-Glutamine 149 of micrococcal nuclease (n,,= 149) are 
structureless in isolation.'”’*? When they are mixed 
together, however, they combine with each other to form 
two different forms of the native state that both appear to 
be properly folded but together have only 10% of the 
nuclease activity of the native enzyme. ”” 

Higher yields of enzymatic activity have been 
observed upon combination of fragments of ribonucle- 
ase from B. amyloliquifaciens (fragments of 36 and 74 aa; 
yield 30%),'”' fragments of phosphoribosylanthranilate 
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isomerase from S. cerevisiae (fragments of 174 and 59 aa; 
yield 50%),'” fragments of penicillin amidase from E coli 
(fragments of 209 and 557 aa; yield 60%),'” and frag- 
ments of porcine 3-oxoacid CoA-transferase (fragments 
of 250 and 270 aa; yield 85%).'*” In the case of the ribonu- 
clease, each fragment was a random coil in the absence 
of the other, but in the cases of the isomerase and the 
amidase, one of the two fragments refolded on its own to 
form a compact structure. The two fragments of the 
transferase that were chosen for expression are structural 
domains in its crystallographic molecular model, and 
both formed compact structures in the absence of the 
other, but neither formed the structure it has in the intact 
protein. None of the fragments from any of these pro- 
teins had enzymatic activity on its own, and it was only 
upon mixing the two respective fragments that activity 
was regained. 

When a protein is split into two fragments and the 
separated, incompetent fragments are mixed together in 
the hope of regenerating the native state of the protein, 
the situation is complicated by the fact that the frag- 
ments must associate with each other. For example, the 
complex of the fragments of ribonuclease from B. amy- 
loliquifaciens and the complex of the fragments of phos- 
phoribosylanthanilate isomerase from S. cerevisiae had 
dissociation constants of 0.4 uM and 0.2 uM, respec- 
tively, so the fragments had to be present at concentra- 
tions in excess of these dissociation constants for the full 
yield of the native state to be regained." 

One solution to this problem of the bimolecular 
association of the fragments is to perform a circular per- 
mutation’ of the protein. By genetic manipulation, the 
coding sequence for the protein in the DNA is severed at 
a particular position, and the portion to the 5’ side of the 
break is moved to the 3’ end of the remainder of the 
coding sequence. The 3’ end of the 3’ fragment is joined 
in phase to the 5’ end of the 5’ fragment with a linking 
sequence of DNA encoding a segment of polypeptide 
long enough to connect comfortably the carboxy termi- 
nus of the original unpermuted protein to the amino ter- 
minus of the original unpermuted protein.” 
Consequently, a protein is eligible for circular permuta- 
tion only if its amino terminus and its carboxy terminus 
are near each other in its crystallographic molecular 
model so that after the circular permutant has been 
expressed and has folded properly, its former carboxy 
terminus and former amino terminus can be joined by a 
continuous stretch of polypeptide. 

Following circular permutation, there is a break in 
the polypeptide elsewhere in the native structure of the 
protein, formally equivalent to the break that would oth- 
erwise produce two fragments of the protein, but the 
polypeptide is continuous from the former carboxy ter- 
minus to the former amino terminus. If the break is 
placed at a position in its amino acid sequence known to 
be a disordered loop, the circularly permuted protein 
will usually fold, display almost normal enzymatic activ- 
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ity or biological function, assemble into the native 
oligomer, and have similar standard free energies of fold- 
ing to that of the wild type.'**’® In such a situation, the 
disordered loop is broken by the new amino and carboxy 
termini, and the former carboxy and amino termini, 
which are usually disordered anyway, are now joined 
together, to prevent the two fragments of the original 
protein from dissociating when it is unfolded. 

Circular permutation can be used to examine the 
information necessary to fold. The position at which the 
amino acid sequence of the wild-type protein is broken 
to produce the new amino terminus or carboxy terminus 
can be varied at random, and circular permutants that 
are still enzymatically active can be selected genetically. 
In almost all of the enzymatically active circular permu- 
tants of aspartate carbamoyltransferase from E. coli, the 
new carboxy and amino termini were found in segments 
of the polypeptide between «helices and £ structure in 
the crystallographic molecular model of the wild-type 
protein.’ This result seemed reasonable at the time 
because such elements of secondary structure probably 
could not form if there were a discontinuity within them. 
When a similar analysis, however, was made of random 
circular permutants of thiol:disulfide interchange pro- 
tein dsbA from E. coli, the majority of the new amino and 
carboxy termini of the enzymatically active permutants 
were located at positions in the sequence of amino acids 
that in the wild-type protein are a helices or D structure. 
Four of the nine a helices and three of the five p strands 
could be interrupted, and the resulting circular permu- 
tants folded and were enzymatically active.!® 

Another approach is to place systematically the new 
carboxy and amino termini at each position in the amino 
acid sequence of the protein and measure the enzymatic 
activity and standard free energy of folding for each of 
the resulting circular permutants. When such an analy- 
sis! was performed on dihydrofolate reductase from 
E. coli (naa = 159), a set of 10 segments varying in length 
from 2 to 14 aa could be identified, the interruption of 
which by introducing new amino and carboxy termini at 
any position led to a protein incapable of folding and 
enzymatically inactive. Placing the interruption at 
almost any one of the 87 positions outside these 10 for- 
bidden regions gave a circular permutant that could fold 
to produce an enzymatically active protein. As with 
thiol:disulfide interchange protein dsbA, many of the 
permissive positions were within segments that are œ he- 
lices or £ strands in the crystallographic molecular model 
of the wild-type protein. The forbidden regions, how- 
ever, also failed to correlate with elements of secondary 
structure. These results suggest that the information nec- 
essary to fold a polypeptide may be distributed over its 
sequence of amino acids by rules that are not immedi- 
ately obvious. 

The fact that many, if not most, of the circular per- 
mutants of a protein can fold to produce enzymatically 
and biologically active proteins and even the proper 


oligomers clearly states that one piece of information 
that has nothing to do with the folding of a protein is the 
order in which its amino acids emerge from the ribo- 
some. If there are portions of the protein that do fold 
before the complete protein emerges, those portions are 
not required to fold before the complete protein 
emerges. It has been suggested, however, that if a protein 
has domains, each domain might be required to fold as it 
emerged from the ribosome during biosynthesis before 
the next emerged. There is no evidence in favor of this 
conjecture, and proteins containing two or more 
domains undergo reversible folding as readily as pro- 
teins with only one domain.** 167169 

If the reaction producing the unique native state of 
a folded polypeptide is an isomerization between the 
random coil and that native state, the individual contri- 
butions to the overall standard free energy change for 
this isomerization determine its outcome. Neither the 
formation of a hydrogen bond between a donor and an 
acceptor in the random coil nor the formation of an ionic 
interaction between a positively charged side chain and 
a negatively charged side chain in the random coil can 
provide any net favorable standard free energy for the 
folding of a protein in aqueous solution. In fact, their for- 
mation would be unfavorable. Nor can van der Waals 
forces make any contribution because the isomerization 
occurs in a condensed phase. Therefore, by exclusion 
and perhaps for the lack of a better candidate, the 
hydrophobic effect has attracted the most attention in 
discussions of the folding of a polypeptide.’ The 
hydrophobic effect provides favorable standard free 
energy for the formation of the native state because 
hydrophobic side chains, which are exposed to water in 
the random coil, are removed to the interior of the pro- 
tein during the folding.’” 

One of the major deficits of standard free energy in 
the folding of a protein results from the requirement to 
unsolvate those hydrophilic functional groups destined 
for the interior. This loss is due to the fact that water par- 
ticipates in strong interactions with donors and accep- 
tors of hydrogen bonds and charged functional groups 
and to the fact that when charged side chains are with- 
drawn from water they are usually neutralized first. The 
removal of even neutral hydrogen-bond donors from 
water, even though they may always find an acceptor in 
the interior of the protein, is a significantly endothermic 
transfer.‘ It has already been noted, however, that the 
formation of a hydrogen bond between an acceptor and 
a donor on a side chain, in the context of a folded 
polypeptide, is usually favorable with a standard free 
energy of formation of around -5 kJ mol! (Table 6-6). 
For example, in 52 instances in which a tyrosine was 
mutated to a phenylalanine, the standard free energy of 
folding increased by 6 + 4 kJ mol when the tyrosine was 
involved in a hydrogen bond in the crystallographic 
molecular model of the protein but showed no change 
when it was not. In 40 instances in which a threonine was 


mutated to a valine, the standard free energy of folding 
increased by 4 + 4kJ mol! when that threonine was 
engaged in a hydrogen bond in the crystallographic 
molecular model but showed no change when it was 
not.” 

The reason that these hydrogen bonds between 
side chains in the native state have the modestly favor- 
able free energies of formation that they do is approxi- 
mation. The hydrophobic effect drives the condensation 
of the random coil that unavoidably withdraws donors 
and acceptors in the backbone of the polypeptide from 
contact with water. These donors and acceptors then 
combine to form the hydrogen bonds that define the sec- 
ondary structure. These hydrogen bonds form because 
these donors and acceptors can no longer participate in 
hydrogen bonds with water and must do so among them- 
selves. o Helices and £ structure appear not because they 
are beautiful (Figures 6-6 and 6-9) but because they are 
an efficient way to provide an acceptor to most if not all 
of the donors pulled out of the water by the condensation 
driven by the hydrophobic effect. The proper packing of 
the secondary structure then can juxtapose the donor on 
a side chain with an acceptor. It is this approximation, 
brought about by the complete, cooperative process of 
folding, that is the only reason the resulting hydrogen 
bond has a favorable standard free energy of formation 
relative to the separated donor and acceptor in the 
random coil. 

Because the realization of this favorable standard 
free energy of formation results from approximation, 
there are significant geometric requirements for its 
favorability. In addition, if too many of the hydrophobic 
groups on the side chains in the interior were replaced 
with properly aligned donors and acceptors to exploit 
these favorable increments in standard free energy of 
formation, the polypeptide could not fold in the first 
place.” These are among the reasons that there are few 
such hydrogen bonds involving a donor on a side chain 
in the interiors of proteins.'’! Those few hydrogen bonds 
between side chains that are found are the result of evo- 
lution by natural selection so it is not surprising that they 
have favorable free energies of formation. 

In addition to the unfavorable standard free energy 
of transfer associated with the dehydration of 
hydrophilic functional groups as they are pulled into the 
interior of the folded polypeptide," the other major 
deficit that must be overcome during the folding process 
is the configurational entropy of the random coil. This 
is the positive, intrinsic standard entropy that arises from 
the fact that the random coil can assume a large number 
of different configurations. It represents a deficit during 
the folding of the polypeptide because the native state, to 
a first approximation, assumes only a few conforma- 
tions. Therefore, when the random coil becomes the 
native state its configurational entropy almost disap- 
pears. 

At first glance, it seems that the configurational 
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entropy of the random coil, dictated by the sum over all 
of its states, should be very large because each amino 
acid has at least the two dihedral angles, wand ọ (Figure 
6-2), each of which can assume a number of values as 
dictated by the Ramachandran plot (Figure 6-4). This ini- 
tial intuition, however, neglects excluded volume.!” 
Excluded volume designates the qualification that every 
configuration of the random coil in which two or more 
atoms would occupy the same space at the same time is 
impossible and thus cannot contribute to the configura- 
tional entropy. This is the consequence of the steric 
effects that produce the Ramachandran plot itself oper- 
ating over the whole polymer rather than just between 
neighboring amino acids. Excluded volume makes a 
large contribution to diminishing the configurational 
entropy of the random coil. For a polypeptide 100 amino 
acids in length, a set of configurations could be gener- 
ated by randomly assigning values to the dihedral 
angles wand ọ within their allowed ranges. The number 
of these configurations that do not superpose two or 
more atoms in the polypeptide has been estimated to be 
only 10™ of the total number of randomly generated 
configurations.’ 

Even though a consideration of excluded volume 
remarkably decreases the number of configurations 
available to the random coil, there are still a large 
number of configurations that are accessible. Only a 
small number of these configurations constitute the 
compact native state of the folded polypeptide. For 
the native state to be stable relative to the random coil, 
the configurational entropy resulting from the sum over 
all of the allowed unfolded configurations must be over- 
come by the hydrophobic effect realized upon the for- 
mation of the native state. 

The presence of a cystine in a folded polypeptide 
makes a contribution to the change in configurational 
entropy for the isomerization between random coil and 
native protein. The polypeptide must first fold before the 
cysteines juxtaposed by the folding can be oxidized to 
cystines.'’’ Because the folded native state is a prerequi- 
site for the formation of a proper cystine and because the 
formation of a naturally occurring cystine usually has 
little effect on the structure or conformational freedom 
of the native protein,'’”’” it necessarily follows that the 
cystine itself cannot significantly change the intrinsic 
configurational entropy of the properly folded protein 
and can change its intrinsic standard enthalpy only by 
the standard enthalpy of formation of the cystine. It has 
been demonstrated, however, that the standard enthalpy 
of formation for cystine in a random coil is about the 
same as that estimated for the standard enthalpy of for- 
mation of the same cystine in the native starte 
Consequently, the incorporation of a cystine cannot 
affect the change in standard enthalpy of folding either. 
Rather, a cystine between two cysteines that are adjacent 
in the native structure increases the value of the equilib- 
rium constant of the folding and decreases the change in 
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standard free energy of folding of a protein because it 
decreases the configurational entropy of the random 
coil. 

The decrease in standard free energy of folding can 
be demonstrated experimentally by introducing a spe- 
cific cross-link between two adjacent amino acids into 
the native structure of a protein and determining its 
effect on its folding. Glutamate 35 and the enol tautomer 
of the oxindole produced by the oxidation (Reaction 
10-37) of Tryptophan 108 in lysozyme from G. gallus 
form an ester: 


108 
e? 
KE 
N 
H 
13-3 


This ester introduces an intramolecular cross-link 
between these two amino acids’! that are adjacent to 
each other in the crystallographic molecular model of the 
protein. The cross-linked lysozyme has a standard free 
energy of folding at pH 2 in 2 M guanidinium chloride at 
62 °C that is 22 kJ mol” less than that of un-cross-linked 
lysozyme.'®° Lysine 7 and Lysine 41 in bovine 
ribonuclease A can be cross-linked specifically by 
2-(p-nitrophenyl)-3-(3-carboxy-4-nitrophenyl)thio- 
1-propene (Figure 10-8).' The difference in the stan- 
dard free energy change of folding between the cross- 
linked and un-cross-linked ribonuclease at pH 2.0 and 
40 °C'* is-21 kJ mol”. 

Many studies have incorporated single cystines by 
site-directed mutation between positions in the amino 
acid sequence of a protein that are adjacent to each other 
in its crystallographic molecular model. For example, 
cystines have been introduced into lysozyme from bac- 
teriophage TA (naa = 164). This protein has no cystines to 
begin with, so each mutant contained only one cross-link 
in its polypeptide.'"'® In one study, four different 
mutants were made containing cystines cross-linking 
positions 29, 34, 121, and 155 aa apart. In another, the 
same two site-directed mutants containing cystines 
cross-linking positions 121 and 155 aa apart were used 
and a third, cross-linking positions 94 aa apart, was 
made; and each of these mutants was submitted to the 
same circular permutation to produce permutants of 
T4 lysozyme with cystines cross-linking positions 49, 15, 
and 76 aa apart, respectively. Together, these manipula- 
tions gave eight mutants of the same protein, each witha 
cross-link producing a covalent loop in the denatured 
polypeptide of a different length. For each mutant, the 
difference in the standard free energy of folding 
(AAG°rqss) between the protein with the cystine intact 
and the protein with the cystine cleaved by disulfide 
interchange (Figure 3-20) was estimated from its melting 


temperature. The differences in standard free energies of 
folding varied from -3 to -14 kJ mol”, and the difference 
increased monotonically as the distance between the 
cysteines, and hence the length of the loop, increased. 

A theoretical treatment of the expected decrease in 
configurational entropy caused by cross-linking a 
random coil, which accounts for excluded volume, pre- 
dicts that the configurational entropy should decrease 
linearly with the natural logarithm of the distance 
between the cross-linked positions with a slope of 2.4.'%° 
When the experimental values of (AAG°rass)/T are plot- 
ted against the natural logarithm of the distance, in 
number of amino acids, between the cystines in each of 
the mutants of T4 lysozyme, the expected relationship is 
observed. Furthermore, the difference in standard free 
energy of folding between native a-lactalbumin and 
a-lactalbumin in which the cystine between Cysteine 6 
and Cysteine 120 has been reduced falls on the same 
line.'® 

The effects of introducing cystines into other pro- 
teins, however, differ significantly from those measured 
in these observations. In some instances the magnitude 
of the difference in standard free energy of folding is 
much less than expected;'® in others, much more II The 
magnitudes of the differences in standard free energy of 
folding for lysozyme from G. gallus cross-linked through 
the oxindole ester (13-3) and bovine ribonuclease A 
cross-linked by 2-(p-nitrophenyl)-3-(3-carboxy-4-nitro- 
phenyl)thio-1-propene are also greater than those 
observed with cystines introduced into lysozyme from 
bacteriophage T4. 

The equilibrium constant for the folding of the con- 
stant fragment of the light chain of an immunoglobu- 
lin G, C, (Figure 11-1), at pH 7.5 and 25 °C in solutions of 
guanidinium chloride is decreased when its single cys- 
tine is reduced. All of the change could be accounted for 
by the fact that the observed rate constant for folding (kp 
in Equation 13-1) of the random coil with the cystine was 
100-fold greater than the observed rate constant for the 
random coil without the cystine.'” This is consistent 
with the conclusion that the intact, correct cystine 
decreases the configurational entropy of only the 
random coil, while retaining access to the properly 
folded structure, and permits the random coil to fold 
more rapidly. Whether or not the proper cystine was 
present had no effect on the observed rate constant of 
unfolding (ky in Equation 13-1). 

It has been shown that if the favorable noncovalent 
standard free energies of association between a subpop- 
ulation of the monomers along a polymer are signifi- 
cantly more negative than their individual standard free 
energies of solvation, the polymer should spontaneously 
collapse to a globular form.'”' Because the constraints of 
excluded volume are even more extreme in this compact, 
globular form, the number of accessible configurations 
and hence its configurational entropy should be much 
smaller. Because the hydrophobic effect is the only inter- 


action capable of producing significant net favorable 
standard free energies of association among the side 
chains of the amino acids in a random coil, it is generally 
assumed that the noncovalent force that would perform 
the condensation leading to a globular state of a 
polypeptide is the hydrophobic effect exerted upon the 
hydrogen-carbon bonds in those side chains in the 
polypeptide. This view of the folding of a polypeptide 
could be called the condensation model. Its central pro- 
posal is that the collapse of the random coil to a con- 
densed state decreases the configurational entropy of the 
polypeptide dramatically and narrows the search for the 
native state to a much smaller number of accessible con- 
formations. 

On the basis of this model, folding can be treated 
theoretically as a process in which the unfavorable loss of 
the configurational entropy of the random coil is bal- 
anced only by the favorable removal of hydrophobic side 
chains from contact with the aqueous phase (HI The 
statistical treatment of the random coil developed by 
Flory," which takes account of excluded volume and 
the solvation of the monomeric units, can be expanded'”® 
to include the hydrophobic effect exerted during the 
sequestration of the monomers in the condensed state'” 
and the fortuitous sequestration of the monomers in the 
random coil,!’® as well as the much smaller, but still sig- 
nificant, configurational entropy of the condensed 
polypeptide before it assumes the native state. 

The process of folding is divided into two imaginary 
steps, JF not necessarily related to the actual steps. These 
imaginary steps are the condensation of the random coil 
to a globular structure excluding water and the reconfig- 
uration of the polymer in this condensed state to maxi- 
mize the exposure of hydrophilic groups and minimize 
the exposure of the hydrophobic groups to the water. It is 
during this reconfiguration following the condensation 
that the donors and acceptors for hydrogen bonds that 
have been withdrawn away from the acceptors and 
donors of the water form hydrogen bonds among them- 
selves to produce the a helices and £ structure observed 
in the final native state of the protein. Before the con- 
densation, water formed hydrogen bonds with those 
donors and acceptors. 

With reasonable values both for the hydrophobic 
effect on the average hydrophobic amino acid (-8 kJ 
mol’) and for the fraction of the amino acids in the 
polypeptide that are hydrophobic (0.50), the formation 
of a unique globular state from a random coil should pro- 
ceed with net negative standard free energy change for 
polypeptides greater than about 70 amino acids in 
length.'** Polypeptides less than about 70 amino acids in 
length should not fold because they should not be able to 
bury a large enough number of hydrophobic amino acids 
to overcome the configurational entropy of their random 
coils. It is the case that small, folded, cystineless, 
monomeric proteins of less than 70 amino acids are quite 
rare. Proteins composed of polypeptides shorter than 70 
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amino acids usually contain several cystines, are 
oligomeric,'”*'” or have a relatively large hydrophobic 
core.’ There are, however, a few small domains, for 
example the WW domains (35 aa)!” or the peripheral 
subunit-binding domain from dihydrolipoyllysine- 
residue acetyltransferase (41 aa) H that fold to form 
stable monomeric, well-defined native structures lacking 
cystines. 

Although it was only for the sake of the computa- 
tions that the folding of the polypeptide described in this 
condensation model was divided into the two steps of 
condensation and reconfiguration, there are stable, con- 
densed but fluidly unstructured states of a polypeptide 
that seem to have the properties required of a condensed 
state on its way to the native state. These are the molten 
globules. A molten globule is a state of a polypeptide in 
which it has collapsed to a globular particle from the 
expanded random coil but remains fluid with a con- 
stantly changing conformation rather than achieving the 
limited set of conformations that is the native state. In 
such a fluid condensed state, the configurational entropy 
of the polypeptide should be significantly reduced rela- 
tive to that of the random coil, and only a much smaller 
number of conformations that avoid the problems of 
excluded volume should be accessible.'*' Many of these 
conformations should display o helices and £ structure 
that form spontaneously.””' 

Under conditions that differ significantly from 
those in the living system in which a particular polypep- 
tide has evolved to fold, its native state may no longer be 
the most stable of the condensed conformations accessi- 
ble to that polypeptide, and a number of other con- 
densed, structured conformations may be as stable. 
Peculiar conditions such as low pH or the presence of 
denaturants, however, are necessary to prevent the 
polypeptide from assuming its native state, as it would do 
normally. It is argued that the intermediates detected in 
the folding of proteins under several such circumstances 
are examples of molten globular states and that all of 
these various intermediates represent a single configura- 
tional state assumed by a polypeptide that is at least as 
distinct as that of the random coil. This may be an over- 
statement. For example, two different molten globular 
states of apomyoglobin have been distinguished,” and 
there are intermediate states that do not have the prop- 
erties assigned to a molten globule.” 

Stable intermediates believed to be molten glob- 
ules have been detected under many different circum- 
stances. They have been observed for a-lactalbumin?"* 
below pH 4.5 at concentrations of guanidinium chloride 
below 2.5 M; for q@-lactalbumin,”” stripped of bound 
Ca”, at pH 8 and guanidinium chloride concentrations 
between 0.5 and 2.0M; for cytochrome c”*” below 
pH 3 either at chloride concentrations greater than 0.1 M 
or at concentrations of O-a-D-glucopyranosyl(1-3)- 
B-p-fructofuranosyl-a-p-glucopyranoside greater than 
0.5 M; and for carbonate dehydratase™ at temperatures 
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below 60 °C and values of pH less than 3.5. The mutation 
of Phenylalanine 173 to an alanine in murine interleu- 
kin 6 converts the protein into a molten globule.” These 
are all unphysiological conditions, but proteins have 
evolved to be entirely in their native states under physio- 
logical conditions. Consequently, it would not be sur- 
prising that, to isolate intermediates in the normal 
process of folding, such peculiar conditions would be 
required. Molten globules often become stable relative to 
the native state at low pH. Presumably, their fluidity per- 
mits carboxy groups that are rigidly buried in the native 
state to reach the surface of the globule and be exposed 
to the solvent, thereby lowering the standard free energy 
of the molten globule relative to that of the native struc- 
ture (Equation 13-3). 

Various physical measurements have been made of 
these stable intermediates identified as molten globules. 
The circular dichroic absorptions of the native protein 
between 260 and 290 nm are largely lost in the molten 
globule, and this loss must result from the disappearance 
of the unique asymmetric environments around trypto- 
phans, tyrosines, and phenylalanines.®*” The complex 
nuclear magnetic resonance spectrum of the native 
state becomes much simpler and much more like that of 
the random coil upon formation of these molten glob- 
ules,”’°?"! as would be expected if the unique environ- 
ments around each amino acid had been lost and each 
side chain now sampled continuously a broad range of 
changing environments. When the internal dynamics of 
one of these molten globules are examined by quasielas- 
tic neutron scattering,””” it is observed that the potential 
barriers to bond rotations in the side chains are lower 
than those in the native state, while diffusive motions of 
side chains are greater, and significantly smaller units of 
structure diffuse cooperatively than those diffusing in 
the native state. Measurements of the absorption of 
ultrasound also indicate that such molten globules are 
more fluid than the native state,’ and conformational 
relaxations in the interior that occur in the 2 MHz range 
are significantly enhanced. 

The majority of the circular dichroic absorption 
between 200 and 240 nm seen in the native states is 
retained in the respective molten globules, and this sug- 
gests that they contain o helices and £ structure.°®° The 
slow proton exchange observed by nuclear magnetic 
resonance for buried peptide bonds in the native state 
increases by factors of 1000-100,000 upon the transition 
to one of these molten globules, even though the 
aamido protons of many of the same amino acids 
remain relatively less accessible.®*"4 These observations 
suggest that some of the same elements of secondary 
structure but not all of them’! remain at the same loca- 
tions in the amino acid sequence of the polypeptide but 
open up 1000-100,000-fold more often. These acceler- 
ated rates of exchange are increased much more by 
adding denaturants'”® so the conformational changes 
within the molten globule leading to exchange of amido 


protons do involve some expansion of the structure into 
the solvent, but the effect, and hence the expansion, is 
much less than when the native state unfolds to the 
random coil. Three of the eight œ helices of the native 
state of apomyoglobin’” are present in its molten glob- 
ule,” but site-directed mutations at the interfaces in the 
native state between these a helices have little effect on 
the stability of the molten globule.?"’ It was concluded 
that although these o helices had formed, they were not 
packed against each other in any stable arrangement. 

The accessibility of tryptophans to the solvent, as 
judged by quenching of fluorescence (Equation 12-41), 
differs significantly in these molten globules, and trypto- 
phans that are relatively more exposed in the native state 
become less exposed.””® The fluorescence intensities of 
the tryptophans in cytochrome c, which are quenched by 
the nearby heme in the native state but are fully expressed 
in the random coil, remain quenched in its molten glob- 
ule. The intrinsic viscosity, rotational relaxation 
times, and diffusion coefficients of the molten globules 
are indistinguishable from those of the corresponding 
native states but are different from those of the random 
coil.”°*?!° All of these observations demonstrate that they 
are condensed, globular structures like the native state. 

It is thought that these intermediates identified as 
molten globules represent the random coil that has col- 
lapsed to a globular state because of the hydrophobic 
effect, even though it is fluid and cannot assume the 
unique set of conformations that is the native state. In 
this regard, it is interesting that the majority (85%) of the 
change in standard heat capacity between the random 
coil and the native state of «-lactalbumin, which is a sig- 
nature of the hydrophobic effect, is experienced in the 
transition between the random coil and the intermediate 
that has been characterized as a molten globule.”” The 
equilibrium constant for the formation from the random 
coil of an intermediate thought to be a molten globular 
state of apomyoglobin displays a significant temperature 
dependence, passing through a maximum between 0 
and 20°C. This observation is also consistent with a 
process accompanied by a large change in standard heat 
capacity,“ but in this case, only about 50% of the over- 
all change in standard heat capacity is realized in the 
transition from random coil to molten globule. 

If these intermediate, molten globular states resem- 
ble intermediates on the normal kinetic pathway 
between the random coil and the native state, then the 
condensation model for the folding of a polypeptide may 
be an accurate rendition of the process. In this descrip- 
tion of folding, the random coil spontaneously collapses 
under the influence of the hydrophobic force to form a 
condensed state that would be a molten globule. This 
molten globule would fluidly sample the limited number 
of conformations available to the condensed polymer 
until the native state, the set of conformations of lowest 
standard free energy, was encountered. 

The alternative to the condensation model for the 


folding of a polypeptide could be referred to as the nucle- 
ation model. In this view of the process, a short segment 
of the polypeptide or several short segments would spon- 
taneously assume a metastable conformation similar to 
the conformation of that short segment or those short 
segments in the complete native state. This nucleus for 
folding would resemble the conformation of the native 
state in both its secondary and tertiary interactions in 
this restricted region, and it would represent the most 
independently stable region of the native state. From this 
nucleus, folding would rapidly spread to produce the 
entire native structure. Evidence for this proposal comes 
from the study of short segments of polypeptide that can 
assume structured states other than the random coil and 
from stable expanded states of some polypeptides. 

Although almost all short segments of polypeptide 
have proven to be structureless, a few have been found 
that assume a structured state. For example, two pep- 
tides from bovine pancreatic trypsin inhibitor (n,, = 58), 
Arginine 20-Phenylalanine 33 and Asparagine 43-Ala- 
nine 58, were chemically synthesized and joined by 
forming the cystine between Cysteine 30 and Cysteine 51 
that occurs naturally in the native protein. This covalent 
complex, containing only half of the covalent structure of 
the full-length protein, nevertheless formed a struc- 
ture”!” that had some of the structural features assumed 
by this region in the crystallographic molecular model of 
the protein. The antiparallel 8 sheet could be discerned 
in the nuclear magnetic resonance spectrum but not the 
a helix. The short, stable a-helical segments of polypep- 
tide discussed earlier have also been proposed as models 
of nucleation points in protein folding.” 

There are also stable conformations of a few 
polypeptides, observed under circumstances promoting 
denaturation, in which condensation has not occurred 
but elements of structure resembling those in the native 
state have formed. For example, there is an expanded 
form of the polypeptide of cytochrome c observed at low 
ionic strength and low pH in which o helices found in the 
native state of the protein are formed“ and an expanded 
form of the a subunit of tryptophan synthase from E. coli 
in which a hydrophobic cluster has formed.” In the 
denatured form of ribonuclease from B. amyloliquifa- 
ciens observed at pH 2 and 25 °C, which is unfolded, two 
of the a helices found in the native state of the protein are 
formed in low yield.” All of these results suggest that por- 
tions of the polypeptide, when it is still in an almost fully 
expanded state, might assume their native structures ini- 
tially to produce a point of nucleation for overall folding. 

What is more likely, however, is that both conden- 
sation and nucleation occur during the folding of a 
polypeptide. The most obvious evidence for such a sce- 
nario is that some, but not all, of the secondary structure 
that is found in the native state of a protein is usually 
present in its molten globule. Such elements of second- 
ary structure could nucleate the formation of the remain- 
der of the native structure. A fragment of a-lactalbumin 
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that assumes at equilibrium a structure containing the 
ahelices that are present in its portion of the native 
structure of the protein has been proposed to represent 
a point of nucleation for the native state even though it is 
by itself a molten globule.” These observations suggest 
that protein folding involves both condensation and 
nucleation, but not necessarily in that order. The ques- 
tion of the sequence of events in the folding of a polypep- 
tide requires kinetic observations of the process. 
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Problem 13-1: As urea is added to a solution containing 
a protein in its native state, the protein usually begins to 
unfold when the concentration of urea rises above 
4-5 M. This unfolding is due to the ability of urea to sta- 
bilize the unfolded state. 

Consider the side chain of an amino acid that is 
located in the interior of a protein and cannot see the sol- 
vent when the protein is folded. From the point of view 
of this interior side chain, the following series of equilib- 
ria govern the unfolding process: 


random coil 


in water 
O 
AG’ transfer, SS 


native protein 


AG ansfer CH 


random coil in 
solution of urea 


O 
AG transfer, H2O0— [urea] 


The standard free energy changes AG? ransfer interior Ho and 
AG?ransfer,interior>lurea] are standard free energy changes 
that occur as the amino acid is transferred from the inte- 
rior of the protein either into pure water or into a solu- 
tion of urea, respectively, as the protein unfolds to a 
random coil, and AG°ansfer,H,0—furea] 18 the standard free 
energy change involved in transferring the side chain of 
an amino acid, which is exposed during unfolding, from 
water into a solution of urea. 
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(A) How are these three values of AG° related? What 
sign must each carry to explain the unfolding 
caused by urea? 


The following is a table’ of the solubilities of a series of 
amino acids in solutions of several concentrations of 
urea at 25 °C. 


solubilities [g (100 g of solvent)"] 
at noted concentration of urea 


amino 

acid OM 2M 4M 6M 8M 
Gly 25.1 22.7 20.4 17.5 15.00 
Ala 16.7 15.3 13.7 12.1 10.60 
Leu 2.16 2.37 2.34 2.29 2.25 
Phe 2.80 3.42 3.94 4.33 4.67 
Trp 1.38 1.98 2.65 3.31 3.95 
Met 5.59 6.19 6.74 7.00 6.99 
Thr 9.80 9.56 9.07 8.31 7.41 
Tyr 0.0451 0.0600 0.0732 0.0870 0.0986 
His 4.33 4.66 4.70 4.46 4.23 
Gln 4.30 4.49 4.49 4.30 4.02 
Asn 2.51 2.89 3.08 3.22 3.32 


(B) Calculate AG°ranster,H50--[urea) for each model com- 
pound in the units of joules mole "`. Subtract 
AG*transferH,O—[urea) for glycine VE estimate 
AG * transfer, H)0—[urea] for each side chain.” 


These values you have just calculated are tabulated below. 


O 
AG transfer, H,O—denaturant 


(cal mol’) 
urea GdmCl 

sidechain 2M 4M 6M 8M 1M 2M 4M 6M 

Ala 0 +15 +10 +10 10 20 30 45 
Val“ 60 85 125 160 85 115 195 265 
Leu 110 155 225 -295 150 210 355 480 
Ile” 100 140 205 -265 135 190 320 430 
Met 115 -225 325 -415 150 245 400 535 
Phe 180 -330 470 -600 215 355 580 775 
Tyr 225 -395 580 -735 235 385 605 770 
Trp 270 -505 730 -920 400 630 980 -1,235 
Pro“ 75 105 155 -200 100 140 240 320 
Thr 40 60 90 115 65 90 120 125 
His 100 160 205 -255 180 285 385 420 
Asn 135 -225 330 -430 200 -320 490 645 
Gln 80 130 190 -230 135 215 315 360 


“The values for these side chains are estimates based on results for 
the other side chains and on results at a single concentration of 
denaturant. 


(C) Plot AG*tansfer,H,0—{urea) against [urea] for each of 
these side chains. Determine the slopes of these 
lines that give values for (dA G°/d[urea]) in joules 
(mole of side chain) [liter (mole of urea) "]. 


(D) Howdo these numbers correlate with your expec- 
tations in part A? Explain why the protein unfolds 
when [urea] rises above a certain critical level. 


(E) Plot (0AG*ransfer,H,0-[ureaj/O[urea]) against the 
number of hydrogen-carbon bonds in each side 
chain. Is the major effect of urea to counteract the 
hydrophobic effect? Why? 


The accessible surface area of each of these side 
chains has been calculated by a computer from molecu- 
lar models. 


surface area of side chain 


model (nm?) 
Ala 0.21 
Val 0.48 
Thr 0.51 
Leu 0.67 
Met 0.90 
Phe 0.93 
Tyr 1.10 
Trp 1.34 
Asn 0.60 
Gln 0.89 
His 0.83 


(F) Plot (OAG*ransfer,H,0—[urea|/O[urea] against accessi- 
ble surface area, labeling each point on your curve 
to keep track of the side chain it represents. What 
is the value of OOAG*transfer,H,0—{urea|/0[ureal]) / 
o(surface area)? 


Measurements of a physical property displayed by a pro- 
tein can be used to obtain a value of the equilibrium con- 
stant for the transformation between the native state and 
the random coil at different concentrations of a denatu- 
rant. From each of these equilibrium constants, the stan- 
dard free energy of folding, AG°ra,idenaturanı, for the 
reaction at that concentration of denaturant can be cal- 
culated. The figure on the next page is an example of the 
relationship between AG°rqjdenaturanty and the concentra- 
tion of denaturant for the unfolding of lysozyme pro- 
moted by urea and guanidinium chloride. 

The slopes of these two lines ((AG°/d[denaturant]) 
are relative measures of the effectiveness of the two 
denaturants. By fitting a straight line to these data it is 
possible to obtain, by extrapolation, the standard free 
energy of folding in the absence of denaturant, AC rann, 
The following table gathers the results from four separate 
proteins, where [GdmCl],/2 or [urea],/2 is the concentra- 
tion of denaturant when [F] = [U] and AG’r4 = 0. 


guanidinium chloride urea 


[GdmC]]};2 AG’ran,0 [urea]ı;2 AG rang 


protein (M) (kJ mol) M) (kJ mol") 
bovine ribonuclease A 3.01 -39 6.96 -32 
lysozyme G. gallus 3.07 -24 5.21 -24 
bovine chymotrypsin 1.90 -32 4.04 -35 
ovine ß-lactoglobulin 3.23 -52 5.01 44 


(G) From an examination of the figure and an under- 
standing of where the two points tabulated fall 


2 3 4 5 6 
[Denaturant] (M) 


Apparent standard free energy of folding, AG°;q mm, of lysozyme as a 
function of the molar concentrations of urea (@) or guanidinium 
chloride (0) at pH 2.9.” The apparent standard free energy of fold- 
ing is zero at the concentration of denaturant at which [U] = [F]. 
Adapted with permission from ref 107. Copyright 1974 Journal of 
Biological Chemistry. 


upon each line, calculate (Equation 13-23) the 
slope m of the line (m = dAG°/d[denaturant]) for 
each combination of protein and denaturant. 


(H) Calculate the quantity 


(9AG°;ı / d[ guanidinium ]), 
(JAG*;q / dl urea]), 


prot — 


for each protein. This number will serve as a quantitative 
estimate of the relative effectiveness of the two denatu- 
rants. 


D In part F you calculated a quantity 
OAG°rranster,H50—[urea)/O[urea]) /d(surface area). 
Using the same methods, calculate a value for 
AG transfer,H,0—-[Gdmcl/0[GdmCl])/d(surface area) 
from the data in the table preceding part C. 


J) Calculate the quantity 


(d AG transfer, H,O — [Gdmcl] / d [guanidinium]) T 


Q A D ranster,H,O — [urea] [3 [urea]) T 


R 


transfer — 


and compare it to Rprot- 


Problem 13-2: Bovine ribonuclease A is a protein con- 
taining 124 amino acids and four cystines. Ribonuclease 
was added to a series of solutions containing different 
concentrations of guanidinium chloride, and the change 
in its extinction coefficient (A&g7) was measured at 
287 nm when each of the solutions had come to equilib- 
rium.” 
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[GdmC]] A&g7 [GdmCl] Aën 
(M) wl (M) wl 
0.00 0 3.01 68 
0.02 4 3.20 -96 
0.34 4 3.30 -102 
0.99 4 3.54 -119 
1.61 Fé 4.03 -127 
2.20 8 4.37 -126 
2.67 -18 4.84 -121 
2.81 44 5.57 -118 
2.95 -63 6.04 114 

6.82 -110 


The change in absorbance at 287 nm is a spectral indica- 
tor that reflects changes in the environments around the 
tryptophans in a protein. Make a plot ofthese data. 


(A) What is Ag»; of native ribonuclease at 4 M guani- 
dinium chloride? 


(B) What is Ac, of unfolded ribonuclease at 1 M 
guanidinium chloride? 


For ribonuclease it has been proven that only the 
native state and the random coil are present at any con- 
centration of guanidinium chloride. 


(C) Calculate Kra for the equilibrium of Equation 13-1 
for each concentration of guanidinium chloride 
and enter your values into a table. 


(D) Plot In Kou against [GdmCl] and determine, by 
extrapolation, the standard free energy of folding 
for ribonuclease in water, AG°pq 4,0. 


Problem 13-3: In the region of the nuclear magnetic res- 
onance spectrum of bovine ribonuclease A between 8.0 
and 9.0 ppm, the only absorptions present are those 
from the carbons 2 of the imidazole rings of the his- 
tidines. The traces” are from this region of the nuclear 
magnetic resonance spectrum of ribonuclease. Changes 
in the spectrum occur when guanidinium chloride is 
added to the sample at the noted concentration. 


[GdmCl] 


ee E 
BT 
EE 
EE 
EE 


EE 


1.3M 
0.5M 
0.3M 
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In the native protein (0.0 M guanidinium chloride), four 
absorptions from the protons on carbons 2 are observed. 
They have been assigned to Histidines 48, 119, 12, and 
105, the four histidines of ribonuclease. 


(A) Why does each absorption have a unique position 
in the spectrum of native ribonuclease? 


(B) Why is there only one absorption from the pro- 
tons on the carbons 2, which integrates as four 
protons from the protein, when it is dissolved in 
3.0 M guanidinium chloride? 


(C) Between 0.0 and 1.7 M guanidinium chloride the 
resonances shift around, but above 1.7 M the four 
absorptions coalesce into the one absorption. 
What process is the spectrometer monitoring 
between 1.7 and 3.0 M guanidinium chloride? 


(D) What would be the position of the absorption from 
the protons on the carbons 2 of N*-acetylhistidine 
ethyl ester in 3.0 M guanidinium chloride? 


Problem 13-4: Below are listed several thermodynamic 
parameters that are involved in the process of protein 
folding. 


(a) Change in standard free energy for the hydropho- 
bic effect 

(b) Standard free energy of formation for hydrogen 
bonds 

(c) Change in standard electrostatic free energy 

(d) Configurational entropy of the random coil 

(e) Configurational entropy of the native state 


(A) Which one is most affected by the steric con- 
straints described in the Ramachandran plot? 


Suppose proteins were held together by imine linkages 
rather than peptide bonds: 


Pc 
H fu 

=N 
crv H 


(B) What effect would this have on the parameter you 
have chosen above? 


(C) How would the value of the standard free energy 
of folding AC be affected by this change? 


Kinetics of Folding 


The most straightforward way to initiate the folding of 
the random coil of a polypeptide that has been unfolded 
to a random coil in a concentrated solution of guani- 
dinium chloride or urea is to dilute that solution. The 
dilution is performed so that the final concentration of 
denaturant is well below the region of transition so that 
the equilibrium between the folded state and the 
unfolded state is shifted from one heavily in favor of the 


random coil to one heavily in favor of the folded state 
(Figure 13-1).* 

Often the folding of the polypeptide is complete 
within a few seconds so the dilution must be performed 
rapidly. Usually, the solution of the random coil at a high 
concentration of denaturant is mixed with around 10 vol- 
umes of aqueous buffer of the appropriate ionic strength 
and pH in a rapid mixing chamber. The chamber is 
designed to mix the two solutions completely in less than 
a millisecond as they are forced through it at high veloc- 
ity under considerable pressure. There are several differ- 
ent ways in which the solution emerging from the mixing 
chamber can then be monitored. Usually, a cuvettet is 
attached to the mixing chamber, and the mixture from 
the chamber is passed through the cuvette at the high 
velocity developed in the mixing chamber until it fills 
the cuvette uniformly. Once a steady state is reached, the 
flow is abruptly stopped, and changes that occur in 
the solution in the cuvette after cessation of the flow are 
monitored. The mean time the solution in the cuvette 
has spent between being mixed and the cessation of flow 
coincident with the initiation of the monitoring is the 
dead time of the apparatus. No measurements can be 
made of events that occur during the dead time. In most 
cases, the dead time of such a stopped-flow apparatus is 
1-50 ms. Changes in the absorbance, molar ellipticity, or 
fluorescence of the solution can be monitored continu- 
ously from the dead time onward. 

Often, during the folding of a protein, significant 
changes in absorbance, fluorescence, or molar ellipticity 
or two or three of these properties occur within the dead 
time of the apparatus. Type I ribonuclease H from E. coli 
displays such behavior (Figure 13-9). When its 
molar ellipticity at either 220nm (Figure 13-9A) or 
292 nm (Figure 13-9B) is monitored, the signal observed 
after flow has stopped decays to the value for the native 
state in a single, apparently first-order relaxation 
(Equation 13-13) with a rate constant of 0.6 sl When 
these time courses are extrapolated through the dead 
time back to the instant of mixing, however, it can be 
seen that 83% of the change in molar ellipticity at 220 nm 
and 44% of the change in molar ellipticity at 292 nm did 
not occur during this apparently homogeneous transfor- 
mation but in one or more kinetic steps that were much 
more rapid than the final isomerization. These steps 
occurred within the dead time of the apparatus, and, 
consequently, they could not be resolved. 

A reaction that is unresolved in a stopped-flow 
experiment because it is complete during the dead time 


* It is also possible to begin the experiment with a solution of the 
complex between the polypeptide and dodecyl sulfate and then 
rapidly strip the dodecyl sulfate from the protein.” The difficulty 
with this approach is that the polypeptide in the complex with 
dodecyl sulfate is completely a-helical rather than a random coil. 
TA cuvette is a chamber with transparent walls through which 
spectrophotometric measurements can be made. 


Figure 13-9: Kinetic burst during the refolding of typel 
ribonuclease H from E coli.” Type I ribonuclease H (naa = 155) was 
dissolved in 3.3 M guanidinium chloride and 10 mM sodium acetate, 
pH 5.5 at 25 °C. After it had unfolded completely, it was mixed in a 
stopped-flow apparatus with 10 volumes of 0.65 M guanidinium chlo- 
ride and 10 mM sodium acetate, pH 5.5 (final concentration = 0.9 M 
guanidinium chloride), and the effluent from the mixing chamber was 
monitored following cessation of flow (dead time = 50 ms). (A) Molar 
ellipticity at a wavelength of 220 nm ([6]»29) in units of degrees cen- 
timeter? (decimole of peptide bond)” as a function of time (seconds). 
The molar ellipticity of unfolded ribonuclease H at 220 nm in 0.9 M 
guanidinium chloride, as determined by extrapolation at equilibrium 
as in Figures 13-1A and 13-7A, should be 0. (B) Molar ellipticity at a 
wavelength of 292 nm ([6]29) in units of degree centimeter’ (decimole 
of peptide bonds)! as a function of time (seconds). The molar ellip- 
ticity of unfolded ribonuclease H at 292 nm in 0.9 M guanidinium 
chloride, again as determined by extrapolation at equilibrium, should 
be 20 (dashed line). (C) Fluorescence of tryptophans in the protein 
monitored at emission wavelengths of greater than 300 nm with exci- 
tation at 280 nm. The scale for fluorescence was not quantified, so the 
relative fluorescence of the unfolded protein in 0.9 M guanidinium 
chloride was not reported. The three sets of data were fit with first- 
order relaxations (smooth curves) with rate constants of (A) 0.59 s7}, 
(B) 0.74 s™%, and (C) 0.51 s and 1.95 s™. Reprinted with permission 
from ref 225. Copyright 1995 American Chemical Society. 


is a kinetic burst. The observation of a kinetic burst is 
interpreted to mean that one or more transformations of 
the random coil have occurred during the dead time and 
that they have produced an intermediate state. This 
intermediate state then turns into the native state as the 
reaction is monitored over time. In the case of typel 
ribonucleaseH, this intermediate state becomes the 
native state in a reaction that appears by measurements 
of circular dichroism to display simple first-order kinet- 
ics with a rate constant of about 0.6 s™. This latter trans- 
formation is also revealed in the change in fluorescence 
of the solution (Figure 13-9C). 

The kinetics of the folding of apomyoglobin from 
Physeter catodon,” of micrococcal nuclease from 
Staphylococcus aureus, "777 of equine cytochrome c,” 
of dihydrofolate reductase from E. coli,” of equine 
B-lactoglobulin,*” and of equine lysozyme***”** all dis- 
play similar kinetic bursts producing intermediate 
states that then apparently decay through one or several 
first-order steps to their native states. The appearance of 
these intermediates during a kinetic burst and their 
decay during the period of measurement can be detected 
by their absorbance, by their molar ellipticities in the 
range of 220-230 nm (the far ultraviolet), by their molar 
ellipticities in the range of 270-290 nm (the near ultravi- 
olet), by their fluorescence, or by the transfer of energy 
between donors and acceptors placed at particular posi- 
tions in their amino acid sequences. They are observed 
upon rapid dilution to 0.4-0.8 M urea or to 0.3-0.9M 
guanidinium chloride. They are formed usually within 
less than 10 ms at temperatures between 10 and 25 °C, 
and they then decay at various rates. 

A certain fraction of the changes in molar ellipticity 
or fluorescence that occurs in each of these kinetic bursts 
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results from the instantaneous changes in the molar 
ellipticity or fluorescence that occur in the random coil 
upon the abrupt decrease in the concentration of denat- 
urant and that may or may not be able to be corrected for 
by linear extrapolation of the values at equilibrium for 
these physical properties from beyond the region of tran- 
sition, as was done in Figure 13-1A.”° Their magnitude, 
however, is sufficiently large in most cases that they must 
reflect significant conformational changes in the 
unfolded state that produce real intermediates in the 
process of folding. 

Within the range of denaturant concentration in 
the region of transition where the respective equilibrium 
constants for folding have been shifted into measurable 
ranges, most of these proteins display two-state behav- 
ior in the isomerization of their folding without evidence 
for intermediates. Why do intermediates in folding 
appear at lower concentrations of denaturant? The 
answer lies in the behavior of the observed rate constant* 


* The progress of the folding of a protein monitored spectrophoto- 
metrically can usually be fit by a rate equation for one or more 
sequential first-order steps. Even though this fit is probably an 
oversimplification of the actual events, the observed apparently 
uncomplicated first-order rate constants obtained by such numer- 
ical analysis will be referred to as observed rate constants. 
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of folding, kp, as a function of the concentration of 
denaturant. 

When the logarithm of the observed rate constant 
for the approach to equilibrium for the folding of type I 
ribonuclease H from E coli is plotted as a function of the 
concentration of guanidinium chloride (Figure 
13-10),?*>3° two-state behavior is observed in the region 
of transition, where the observed rate constant for 
unfolding dominates at high concentrations of denatu- 
rant and the observed rate constant for folding domi- 
nates at low concentrations. The logarithm of the 
observed rate constant for folding, however, does not 
display a continuous linear decrease below the region of 
transition as is observed in Figure 13-1B. Instead, its 
behavior is resolved further into two components, one 
dominant at intermediate concentrations of denaturant 
and the other at low concentrations. These two steps are 
distinguished by the two different slopes of the two dis- 
tinct linear segments below the region of transition in 


0.01 


[GdmCl] (M) 


Figure 13-10: Observed rate constants for the approach to equi- 
librium for the folding isomerization of the cysteineless mutant of 
type I ribonuclease H from E. coli.” A solution of 110 mM protein, 
50 mM KCI, 20 mM sodium acetate, pH 5.5, and either no denatu- 
rant (unfolding) or 3 M guanidinium chloride (folding) was mixed 
in a stopped-flow apparatus with 11 volumes of 50 mM KCI, 20 mM 
sodium acetate, pH 5.5, and the appropriate concentration of 
guanidinium chloride to achieve the noted concentration follow- 
ing the mix. The fluorescence emission of the solution at wave- 
lengths greater than 320 nm from an excitation at 295 nm was 
monitored as a function of time as in Figure 13-9C. The traces of 
fluorescence as a function of time could each be fit (Equation 
13-13) by single-exponential increases (unfolding) or decreases 
(folding) of fluorescence. The observed rate constants Kobs 
(second!) for these exponential relaxations are plotted as a func- 
tion of the concentration (molar) of guanidinium chloride 
(GdmC)). The dashed line is a fit of the data to an equation derived 
from a mechanism in which there are three states interconverting: 
the random coil, the native state, and a kinetic intermediate. It is 
assumed that the logarithm of each of the four first-order rate con- 
stants interconverting those three states is a linear function of the 
concentration of guanidinium ion as in Figure 13-1B. Reprinted 
with permission from ref 236. Copyright 1999 Elsevier B.V. 


Figure 13-10.* This behavior is characteristic of a change 
in rate-limiting stepf and consequently is consistent 
with a kinetic mechanism for folding in which there are 
two or more steps and one or more intermediates.” At 
concentrations of guanidinium chloride between 1.5 and 
1 M, the folding of type I ribonuclease H (Figure 13-10) 
has an observed rate constant that is strongly dependent 
on the concentration of guanidinium chloride. At con- 
centrations of guanidinium chloride between 0.2 and 
0.6 M, however, the folding of the protein has an 
observed rate constant that is only weakly dependent on 
the concentration of guanidinium chloride. At concen- 
trations of guanidinium chloride between 0.6 and 1.0 M, 
the change in rate-limiting step occurs from a step 
strongly dependent on concentration of denaturant to a 
later step in the process of folding that is weakly depend- 
ent. 

The two observed rate constants for the two respec- 
tive steps between which the rate limitation shifts are the 
rate constant for the production of the intermediate 
present following the kinetic burst and the rate constant 
for its decay to the native state, respectively. The 
observed rate constant for the formation of this interme- 
diate, which is defined by the linear segment of greater 
slope, is strongly dependent on the concentration of 
denaturant. Because of this strong dependence, above a 
certain concentration of denaturant the rate at which the 
intermediate is formed becomes slower than the rate at 
which it isomerizes to the native state. Consequently, 
above that concentration of denaturant, the intermedi- 
ate, although it is still formed every time a polypeptide 
folds, cannot accumulate because it turns into the native 
state faster than it is formed, and the reaction, which is at 
least a three-state process at low concentrations of 
denaturant or in its absence, appears to become a two- 
state process above 1 M guanidinium chloride. 

Such behavior indicative of a change in the rate- 
limiting step of folding is also observed for the folding of 
type I ribonuclease H from E coli in solutions of urea?’ 
as well as lysozyme from bacteriophage T4 in solutions of 
guanidinium chloride,”® the carboxy-terminal domain 
of the cell surface receptor CD2,”” the inhibitor barstar 
of the ribonuclease of B. amyloliquifaciens, the amino- 
terminal domain of phosphoglycerate kinase from 


*In the simplest cases, where only one continuously varying 
apparent rate constant seems to control the folding at concentra- 
tions of denaturant below those of the region of transition, as in 
Figure 13-1B, it is observed that the logarithm of that apparent rate 
constant is a linear function of the concentration of denaturant, 
much as is the standard free energy of folding. It is customary to 
designate the slope of such a line with the symbol m, just as the 
slope is designated in Equation 13-23. 

+ The rate-limiting step in the mechanism of a reaction is the last 
step in the sequence that exerts any influence on the overall rate.” 
By this definition, all of the steps that follow the rate-limiting step 
must be so fast that they occur immediately relative to the passage 
through the rate-limiting step. 


Bacillus stearothermophilus,'” cytochrome c, from 
Rhodobacter capsulatus,” and human lysozyme.” 

In addition to explaining the appearance of these 
kinetic intermediates as the concentrations of denatu- 
rant are decreased and providing further evidence for the 
existence of one or more intermediates in the folding of 
each of these polypeptides in the absence of denaturant, 
these observations of a change in the rate-limiting step 
provide a clue about the structures of the intermediates 
formed during a kinetic burst. Because the observed rate 
constants for their formation decrease significantly as 
the concentration of denaturant is increased while the 
observed rate constants for their conversion to the 
respective native states decrease much less significantly, 
each of these intermediates must be a more compact, 
condensed state of the polypeptide than the random 
coil. This follows from the proposal that the slope m of 
the line relating the logarithm of an observed rate con- 
stant for the folding of a polypeptide to the concentration 
of guanidinium chloride or urea (for example, the slopes 
of the linear segments in Figures 13-1B and 13-10) is a 
measure of the change in exposure of that polypeptide to 
the solvent between its initial state and the transition 
state of either the rate-limiting step in the transformation 
being monitored'™” or of one or more of the rate-deter- 
mining steps* that together establish the value of the 
composite rate constant” for that transformation or the 
change in exposure of that polypeptide experienced 
during an unfavorable preequilibrium that precedes the 
rate-limiting step for that transformation. Consequently, 
either during the rate-limiting step or prior to the rate- 
limiting step of the transformation occurring during the 
kinetic burst in which the intermediate is formed from 
the random coil, a significant decrease in the exposure of 
the polypeptide to the solvent must occur. 

Further evidence that these intermediates formed 
during a kinetic burst are compact, condensed forms of 
the polypeptide is provided by studies of their scattering 
of X-radiation at small angles. It is possible to measure 
small-angle scattering of X-radiation from a sample in 
the cuvette of a stopped-flow apparatus. When the inter- 
mediate formed from the polypeptide of apomyoglobin 
during the kinetic burst” was examined in this way, it 
was found that the angular dependence of its scattering 
(Figure 12-2) was indistinguishable from that of the 
native state ofthe protein but clearly different from that 
of its unfolded random coil.” This result indicates that 
most if not all of the condensation required to occur 
between the random coil and the native state must be 
accomplished within the kinetic burst. Similar results 
were observed for the folding of bovine ß-lactoglobu- 


* A rate-determining step in a reaction is any step the rate of which 
affects the rate of the overall reaction. In other words, if a step is 
rate-determining, an increase or decrease in its individual rate will 
cause a change in the overall rate of the reaction, but not necessar- 
ily of the same magnitude. 
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lin.” The rotational relaxation time of 1-anilinonaph- 
thalene-8-sulfonate tightly bound to the intermediate 
formed in the kinetic burst during the folding of dihy- 
drofolate reductase from E coli is almost identical to that 
of the same probe bound to the native state, a result also 
suggesting that most if not all of the condensation of the 
polypeptide has already occurred in this isomeriza- 
tion.” Increases in energy transfer by resonance 
between donors and acceptors covalently attached to 
particular positions in a polypeptide also indicate that it 
condenses during one of these isomerizations occurring 
in a kinetic burst.’ 

In addition to being compact, the intermediates 
formed during a kinetic burst contain secondary struc- 
ture. This conclusion follows from the fact that large 
changes in molar ellipticity in the far ultraviolet, similar 
to those accompanying the formation of p structure and 
ahelices (Figure 12-10), usually accompany the burst 
(Figure 13-9) ,226:227,231,233,248,249 A random coil has a slight 
positive molar ellipticity in the range from 210 to 230 nm 
while both $ structure and æ helices have significant, 
negative molar ellipticities (Figure 12-10), so the changes 
observed are decreases in molar ellipticity in this range. 
In fact, the molar ellipticity of the polypeptide of bovine 
B-lactoglobulin at 222 nm actually decreases to a level 
2-fold lower than that of the native state during the 
kinetic burst before increasing to the proper value during 
the formation of the native state.” This result suggests 
that extra a-helical secondary structure is transiently 
forming in the intermediate and then disappearing as the 
native state forms. 

The positions in the amino acid sequence that par- 
ticipate in the secondary structure formed in these inter- 
mediates can be defined by measurements of the 
exchange of specific amido protons from the peptide 
backbone. To perform such measurements, the unfolded 
polypeptide as a random coil in a concentrated solution 
of denaturant is passed in turn through a series of mixing 
chambers (Figure 13-11).”’ In a typical experiment, the 
unfolded polypeptide in 'H,O is rapidly diluted into 
aqueous buffer prepared in *H,O at a pH low enough to 
suppress proton exchange for the time being (Figure 
12-31), and folding commences. After various millisec- 
ond intervals, during which the folding of the protein has 
progressed normally, the pH of the solution is increased, 
usually to a level greater than 9, in a second rapid mixing 
chamber to initiate the rapid and complete exchange of 
all amido protons still exposed to the deuterated solvent 
(Figure 12-31). The pH and duration of this period of 
rapid exchange are set so that it is long enough for 
exposed amido protons to exchange completely but not 
long enough for buried amido protons to exchange sig- 
nificantly. Finally, in a third rapid mixing chamber the pH 
is dropped again to slow the exchange and permit fold- 
ing to be completed in the absence of further exchange. 

In the final folded protein, most of the amido pro- 
tons are well protected from further exchange by its sec- 
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ondary and tertiary structure. Consequently it can be 
submitted to two-dimensional nuclear magnetic reso- 
nance spectroscopy (Figure 12-33) for the periods of 
time necessary to obtain two-dimensional spectra and 
determine which amido protons had become protected 
from exchange during the time spent folding before 
rapid exchange was initiated. Those positions in the 
amino acid sequence of the protein that have lost their 
protons during the experiment are those that were acces- 
sible when the jump in pH occurred; those that have not 
are those that had become protected. The times spent in 
the various steps and the levels of pH established in each 
step of these triple mixing experiments var "ZC but 
the intentions of initiating folding, of performing rapid 
exchange after folding has progressed for a certain 
period, and of then locking the information in the native 
state of the protein remain the same.* 

Although most of the amido protons in an interme- 
diate formed during a kinetic burst exchange rapidly 
during the respective jumps in pH, many are already pro- 
tected from exchange by the structures of these interme- 
diates. 3929,50 Because amido protons protected 
from exchange during the kinetic burst are found in seg- 
ments within which each of a string of consecutive posi- 
tions in the amino acid sequence is protected, it is 
assumed that these segments of continuous protection 
represent either o helices or p structure that have already 
formed in the intermediate. 

Proteins during the folding of which intermediates 
do not accumulate in a kinetic burst nevertheless will 
often have similar intermediate states that form more 
slowly, in the milliseconds following the dead time. For 
example, an intermediate forms during the refolding of 
cytochrome c with a rate constant of 50 s™ at 10 °C that is 
compact and contains several elements of secondary 
structure but lacks the complete secondary and tertiary 
structure of the native protein.” These intermediates 
also have strings of consecutive amido protons protected 
from exchange.” 

It seems that some of the same secondary struc- 
tures found in the final native state of the protein are 
already assembled in their entirety in these early kinetic 
intermediates formed either during a kinetic burst 
(Figure 13-11) or in the period immediately following the 
dead time. To a certain extent, this impression is illusory. 
Because the secondary structures in the native state are 
used to store the information about the protection that 
occurred upon the formation of the intermediate, this 
information is automatically divided into segments 
bounded by those elements of native secondary struc- 
ture. Any information about the regions of the polypep- 
tide outside of these segments of native secondary 
structure has been automatically erased. Amido protons 


* In most of these experiments the protein is unfolded in °H,O and 
then folded in 19,0, and the gain of protons rather than their loss 
is monitored. 


in these erased regions may have been protected in the 
intermediate but not in the native state. It is also possible 
to identify amido protons on particular amino acids that 
participate inhydrogen bonds in an intermediate formed 
in a kinetic burst by examining the effect ofthe concen- 
tration of denaturant on the exchange of amido protons 
within the native structure of a protein.” Again, how- 
ever, the fact that amido protons formed in the interme- 
diate identified in this way coincide with elements of 
secondary structure in the crystallographic molecular 
model may be only a consequence of the fact that 
exchange from the native protein is being monitored. 

The fact that all ofthe positions within a particular 
element of native secondary structure register similar 
levels of protection suggests but cannot prove that many 
ofthe same elements of secondary structure found in the 
native state are formed as discrete units early in the 
process of folding. There are, however, exceptions to this 
correspondence. For example, only a portion of the 
amino acids in helix B of apomyoglobin is protected from 
exchange in the intermediate formed during the kinetic 
burst (Figure 13-11). 

Many if not most of the intermediates observed 
during kinetic bursts in stopped-flow experiments are 
similar if not identical to stable molten globules of the 
same polypeptide that are observed at equilibrium under 
unphysiological concentrations of denaturant, at 
unphysiological temperatures, at low pH, in the absence 
of salt, or with some combination of these perturbations. 
It has already been noted that, as is a molten globule, 
these kinetic intermediates are condensed conforma- 
tions of the polypeptide. In addition, by varying the 
wavelength at which the kinetic measurements are 
made, it has been possible to demonstrate that the molar 
ellipticities at several wavelengths, both in the near ultra- 
violet and in the far ultraviolet, match the values in the 
circular dichroic spectrum of a stable molten globular 
state of the same polypeptide formed at equilibrium, 
usually under acidic conditions (Figure 13-12). 
Furthermore, the protection factors for the amido pro- 
tons in the peptide backbone buried during the forma- 
tion of an intermediate in a kinetic burst” are often in 
the range of those observed for a molten globule at equi- 
librium rather than in the range of the much larger pro- 
tection factors observed for amido protons locked 
within the secondary and tertiary structure of the native 
state. The effects of site-directed mutations of particular 
isoleucines to leucines and valines or particular leucines 
to isoleucines and valines in the hydrophobic core of the 
crystallographic molecular model of dihydrofolate 
reductase from E. coli were dramatically different on the 
yield of the intermediate observed in the kinetic burst 
than on the stability of the native state, a result suggest- 
ing that the packing of the native structure had not yet 
been established in the intermediate, as is the case in a 
molten globule.” 

The most explicit evidence that these intermediates 
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Figure 13-11: Sequestration of amido protons during the folding of apomyoglobin as followed by rapid mixing.” A series of three rapid 
mixing chambers fed by four syringes were assembled in a cold room at 5 °C. Apomyoglobin from P. catodon was dissolved in 6 M urea and 
10 mM sodium acetate, pH 6.1, in HO and stood until it was fully unfolded. Flow through the mixing chambers was then initiated. The solu- 
tion was diluted in the first mixing chamber 8.5-fold with 10 mM sodium acetate, pH 6.1, in ?H,0. This mixture then travelled in the tubing 
connecting the first mixing chamber to the second for various periods of time (milliseconds). In the second rapid mixing chamber, the solu- 
tion was mixed with an equal volume ofa solution containing the buffers tris(hydroxymethyl)methylammonium ion, N-(ethylsulfonato)mor- 
pholinium ion, and acetate ion at a final ionic strength of 0.2 M, pH 10.2 in*H,O, which immediately brought the pH of the mixture to pH 10.2. 
This second mixture passed through a piece of tubing in which it spent 20 ms in transit to the third mixing chamber. In the third rapid mixing 
chamber it was diluted by mixing with a solution of the same ionic buffers at an ionic strength of 0.25 M, pH 1.9 in 7H,O, that adjusted the 
final pH to 5.6. The effluent from the last mixing chamber was directed into a solution of hemin to turn the apomyoglobin to myoglobin and 
lock the protein in its native state. The amplitudes of the absorptions from the amido protons in two-dimensional nuclear magnetic reso- 
nance spectra (Figure 12-33) of the final solutions were determined for each sample. The different absorptions had been previously assigned 
to specific amido protons in the amino acid sequence of the protein.”'**“ The percentage that each position in a set of representative posi- 
tions was occupied (percentage occupancy) by a proton is plotted as a function of the time (milliseconds) the polypeptide was allowed to 
fold before the protons were exchanged with deuterons at pH 10.2. The amido protons are grouped according to the secondary structure they 
occupy in the crystallographic molecular model of myoglobin.?!°°® The eight æ helices in the crystallographic molecular model are desig- 
nated in alphabetical order from the amino terminus and the CD loop is the segment of random meander between helix C and helix D. 
Occupancies of the amides of Leucine 29, Isoleucine 30, and Phenylalanine 33 from helix B are plotted in the upper panel; those of Isoleucine 
28 and Arginine 31 from helix B are shown in the lower panel. Reprinted with permission from ref 227. Copyright 1993 American Association 
for the Advancement of Science. 


formed during the kinetic bursts are similar if not identi- ribonuclease H from E. coli, the pattern in which pro- 
cal to well-characterized molten globules of the same tected amido protons are distributed over its sequence of 
polypeptide observed at equilibrium is the correspon- amino acids” closely matches the pattern in which 
dence in the specific amido protons protected during those amido protons are protected in the molten globule 
their formation with the respective amido protons pro- that predominates at equilibrium at levels of pH less than 
tected in the respective molten globule. For example, in 2.°° When amino acids in regions that have been 
the intermediate formed during the kinetic burst in the observed to be protected in the intermediate formed 
folding of apomyoglobin, amido protons in positions in during the kinetic burst are mutated, the yield of the 
the amino acid sequence that form the first (A), a portion intermediate decreases; but when amino acids that have 
of the second (B), the seventh (G), and the eighth (H) not been observed to be protected are mutated, the yield 
ahelices in the native state of the protein?'® are pro- of the intermediate is unaffected.*”° 

tected, but those that form the rest of the second, the Apomyoglobin and typeI ribonucleaseH both 
third (C), and the fifth (E) œ helices are not protected seem to form an intermediate that already contains some 
(Figure 13-11). This is the same pattern of protection but not all of the specific secondary structures that will 
observed in the molten globule of this polypeptide that is eventually end up in their native states. The kinetic inter- 
the dominant state at equilibrium between pH 4 and 5.° mediates of ß-lactoglobulin, however, which is also a 
Likewise, in the intermediate formed in the kinetic burst molten globule, displays a circular dichroic spectrum 


in the folding of the cysteineless version of typel indicating that it contains significant amounts of o helix 
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Figure 13-12: Circular dichroic spectrum of the kinetic intermedi- 
ate observed during the kinetic burst in the folding of a cysteineless 
version of type I ribonuclease H from E. coli.””® Folding was initi- 
ated by an 11-fold dilution of the unfolded polypeptide in 7 M urea, 
and the kinetics of refolding (Figure 13-9) were monitored by fol- 
lowing molar ellipticity at 16 different wavelengths, each in a sepa- 
rate run. The molar ellipticity [degrees centimeter’ (decimole of 
peptide bonds) ”'] observed at the dead time (burst amplitude; @) in 
each run is plotted as a function of the wavelength (nanometers) 
set for that run. The molar ellipticity observed at the end of each 
run (final folded state; ©) is also plotted as well as the circular 
dichroic spectrum of a molten globule of the protein that forms at 
equilibrium at pH 1.0 (solid line) and the circular dichroic spec- 
trum of the native protein (dashed line). Reprinted with permission 
from ref 226. Copyright 1997 Nature Publishing Group. 


even though only 12% of its native state is œ helical.”"° 
Consequently, in this instance, the intermediate con- 
tains at least some secondary structure that will not be 
present in the final native state. 

That stable molten globules of the same polypep- 
tides at equilibrium are similar if not identical to the 
respective kinetic intermediates formed during kinetic 
bursts means that the former should be valid models for 
the latter. From physical characterizations of these equi- 
librium states at rest, a much better understanding of the 
structure and dynamics of the kinetic intermediates on 
the move can be gained. The details of the physical prop- 
erties of molten globules at equilibrium have already 
been discussed. 

With some proteins, there are intermediates 
formed during a kinetic burst that are not molten glob- 
ules. For example, the polypeptide of lysozyme from 
G. gallus isomerizes during the kinetic burst to an inter- 
mediate that has a radius of gyration intermediate 
between that of the random coil and that of the native 
state.” A molten globule of this polypeptide would have 
a radius of gyration much closer to that of the native 
state. In the folding of dihydrofolate reductase from 
E. coli, little of the accessible surface that is eventually 
buried in the native state seems to be buried in the 
kinetic intermediate formed during kinetic burst.” This 


observation, however, seems to be contradicted by the 
fact that the rotational correlation time of this interme- 
diate is indistinguishable from that of the native state, 
and therefore it should be as compact as the native state.” 

In an observation of folding by stopped-flow, the 
formation of one of these kinetic molten globular inter- 
mediates usually takes place within the dead time of the 
apparatus. The dead time of such an observation, how- 
ever, can be shortened by monitoring continuous flow 
through a rapid mixing chamber rather than stopped- 
flow. A transparent tube is attached directly to the 
mixing chamber, and the two fluids being mixed flow 
continuously at a high velocity through both the cham- 
ber and the tube. The fluorescence of the sample passing 
through the tube is monitored as a function of the dis- 
tance along the tube, and hence the time spent in the 
tube, following mixing. In this way, events can be fol- 
lowed from about 100 ps to 2 ms. The drawback is that, 
because of turbulence,” only emission of light from the 
sample rather than absorption of light can be measured 
at times less than 500 us. The purpose of most of these 
experiments, however, is to monitor the formation of 
intermediates that have already been characterized 
extensively by stopped-flow observations and triple 
mixing experiments and to determine whether the step 
during which they are formed is preceded by yet an ear- 
lier step. 

The rate of formation of a molten globular interme- 
diate that appears during a kinetic burst in stopped-flow 
can often be resolved in continuous flow. For example, 
the observed rate constants for the formation of molten 
globular intermediates in the folding of bovine ß-lac- 
toglobulin,?*" the Bl domain ofimmunoglobulin G bind- 
ing protein G from Streptococcus,° intestinal fatty 
acid-binding protein from Rattus norvegicus,” and col- 
icin E7 immunity protein from E coli*® are 7000 e" 
(20 °C), 2300 s™ (20°C), 1500s, and 3000s” (10°C), 
respectively. The similarity in the values of all of these 
observed rate constants may have more to do with the 
range over which rate constants can be measured by 
continuous flow than a similarity in the process being 
measured. 

In the case of ß-lactoglobulin, the B1 domain, and 
immunity protein, the isomerizations ofthe random coil 
producing the molten globular intermediates at these 
respective rates appear to be single first-order relax- 
ations that are not preceded by any other kinetic burst, 
so the transformation of random coil to molten globular 
intermediate appears to proceed in one step with no 
prior kinetic intermediates. This conclusion follows from 
the fact that the kinetic traces of fluorescence extrapo- 
late through the dead times (100-150 us) to the value for 
the fluorescence of the random coil at the final concen- 
tration of denaturant. Consequently, in these cases, the 
random coil appears to isomerize directly to the molten 
globule. In the case of fatty acid-binding protein, how- 
ever, the extrapolation did not coincide with the fluores- 


cence of the random coil, and the extrapolated values at 
different wavelengths of emission produced a spectrum 
for an additional intermediate formed in the kinetic 
burst that was distinct from the spectrum of the 
unfolded state. It follows that, in this case, at least one 
other intermediate precedes the molten globular inter- 
mediate. 

Events that occur within even shorter intervals can 
be monitored by temperature jump. This approach 
exploits the fact that at a low temperature, the stability of 
a protein increases as the temperature is raised (Figure 
13-5). A solution of protein in a concentration of denat- 
urant within the region of transition is brought to a low 
temperature, which increases the concentration of the 
denatured state at the expense of the folded state. The 
temperature of the solution is then jumped by the rapid 
application of heat, and the solution stabilizes at a higher 
temperature within a hundred nanoseconds. The 
approach to the new equilibrium now favoring the folded 
state is then monitored. When the folding of equine 
cytochrome CT! and barstar from B. amyloliquifaciens* 
are examined after a temperature jump of 10 °C, relax- 
ations with rate constants of 11,000 s™ and 3000 sl were 
observed, similar to those observed for other proteins by 
continuous flow. A much faster relaxation with a rate 
constant of 200,000 s™ at 10 °C is observed for apomyo- 
globin, and this rate constant is significantly affected by 
the viscosity of the solvent, an observation suggesting 
that it represents the collapse of the denatured state,”® 
and a relaxation with a similar rate constant has been 
observed for the folding of bovine ribonuclease A.’ That 
these very rapid relaxations, however, monitor the initial 
global collapse of the random coil has been ques- 
tioned,“ and it is possible that the initial collapse of 
these proteins usually occurs much more slowly, with 
rate constants in the range below 10,000 s™. 

When it is dissolved in 0.01M HCl, equine 
cytochrome cis in a denatured state that is not a random 
coil but is at least as expanded.”*?” Upon jumping of the 
pH to 4.0, the protein refolds from this expanded state. 
The refolding can be followed from 45 us to 1 ms by con- 
tinuous flow (Figure 13-13).’”’ It appears to pass through 
two clearly resolved steps with observed rate constants of 
17,000 s™ and 2300s”. No kinetic burst is observed. 
During the first step 60% of the fluorescence from 
Tryptophan 59, the only tryptophan in the protein, is 
quenched by the covalently attached heme as the 
expanded conformation collapses, but no secondary 
structure forms beyond the residual o helix found in the 
expanded denatured state. During the second step, 
about 70% of the a-helical content of the native state is 
regained, to produce a molten globule. 

What seems to be the same second step, involving 
the formation of the same level of «-helical content, 
occurs more slowly (50 s™ at 10 °C) when the random coil 
in 4.4 M guanidinium chloride is diluted to 0.7 M guani- 
dinium chloride, so it is possible to determine by rapid 
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Figure 13-13: Folding of cytochrome c monitored by continuous 
flow." Equine cytochrome c was dissolved in 0.01 M HCI and 
deionized by molecular exclusion chromatography in 0.01 M HCl. 
It was then mixed in a rapid mixing chamber with 10 volumes of 
50 mM sodium acetate and 50 mM sodium phosphate, pH 5.1. The 
final pH ofthe mixture was 4.5. The effluent from the mixing cham- 
ber was passed through a 0.25 mm x 0.25 mm channel in a quartz 
block at 0.62 mL s™ (0.99 um ps”). The block was illuminated with 
light at 280 nm wavelength, and fluorescence emission at a wave- 
length greater than 324 nm was measured as a function of the dis- 
tance along the channel. The fluorescence relative to that of the 
denatured polypeptide in 0.01 M HCI (1.0) and the folded native 
state at pH 4.5 (0.0) is presented as a function of time (microsec- 
onds). The data are fit with the solid curve, which is the sum of two 
first-order exponentials with rate constants of 17,000 s! and 
2300 el and amplitudes of 0.60 and 0.29. The dead time of the 
apparatus was measured directly and found to be 45 us. Reprinted 
with permission from ref 270. Copyright 1998 Nature Publishing 
Group. 


proton exchange that both the amino- and carboxy- 
terminal whelices of the native structure form during 
this step, but not the a helices between positions 60 and 
80 in the amino acid sequence (Figure 7-9).”°! The 
molten globule formed during this second step, however, 
displays none of the molar ellipticity at 420 nm indicative 
of the asymmetric environment of the native structure 
surrounding the heme.” Finally the native state arises in 
a biphasic process with rate constants of 2.5 s™ and 
0.25 s™ at 10 °C”! or 8 s and 0.8 s™ at 25 °C. 

The expanded denatured state of equine 
cytochrome cin 0.01 M HCl has an a-helical content that 
is 20% that of the native state,” and although the @-hel- 
ical content does not increase during its collapse, the 
earliest observed collapsed intermediate does contain 
this amount of a helix. Moreover, when the folding is 
performed by diluting the random coil of cytochrome c, 
which contains noo helix, from 4.4 to 0.4 M guanidinium 
chloride, there is still no kinetic burst and all of the 
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change in fluorescence at times less than 1 ms can be 
resolved into two steps with rate constants of 21,000 s7! 
and 730 s at 22 °C.*” The initial collapsed state, how- 
ever, again has 20-30% of the a-helical content of the 
native state. 

These observations raise the question of whether or 
not the polypeptide of a protein is able to collapse 
hydrophobically in an isomerization that involves no for- 
mation of secondary structure. Is there a purely 
hydrophobic collapse? Certainly, all of the condensed 
kinetic intermediates observed during the folding of 
polypeptides, when they are assayed for secondary struc- 
ture, do contain it. Furthermore, the fastest events in 
protein folding involving condensation of the random 
coil to form a globular state usually have rate constants 
of less than 20,000 s}, often much less. For example, in 
the case of the immunoglobulin binding domain of pro- 
tein L from Peptococcus magnus, the condensation of the 
random coil to a globular state?’ occurs in a first-order 
reaction with a rate constant of only 0.12 s™. Yet there are 
measurements indicating that the purely hydrophobic 
collapse of a polypeptide should have a rate constant of 
at least 10° s! at 20 °C,?” and theoretical treatments*™ 
suggest that the rate constant should be 10’s”. 
Furthermore, a purely hydrophobic collapse should have 
no energy of activation, but the observed relaxations 
assigned to the collapses of denatured states do. 

Explanations are required for both the slower than 
expected rate constants observed for these condensa- 
tions and the fact that most if not all of the initial con- 
densed states contain significant secondary structure. 
Even the extremely rapid relaxation of apomyoglobin 
with a rate constant of 200,000 s™ observed by tempera- 
ture jump nevertheless seems to involve an intermediate 
with significant secondary structure °°°?%® 

Suppose that kinetically, a purely hydrophobic col- 
lapse occurs before the formation of any of the second- 
ary structure characteristic of a molten globule, and that 
the mechanism for folding is 


kı k, 
U == C —> MG 
-1 


(13-25) 


where C is the hydrophobically collapsed state and MG is 
the subsequent molten globule. If both steps were intrin- 
sically fast reactions, if k, were greater than k,, and if kı 
were greater than k,, the reactions would be coupled, 
and little purely hydrophobically collapsed state would 
be observed. It has just been noted, however, that k, the 
hydrophobic collapse, should be much faster than the 
observed rate for the overall formation of the molten 
globule. 

If kı >> k, then the unfolded state and the 
hydrophobically collapsed state are in a rapid preequi- 
librium that precedes the step in which the secondary 
structure, which distinguishes the molten globule from 


the hydrophobically collapsed state, is formed. The rate 
equation for the formation of the molten globule from 
the unfolded state by this mechanism is 


[MG] = [protein] 1 - ex k2 Kopse t 
= |p TOT BC 


cpse 


(13-26) 


where [protein]ror is the total concentration of protein 
and Kepse is the equilibrium constant for the hydrophobic 
collapse: 


(13-27) 


Equation 13-26 defines a first-order formation of the 
molten globule with an observed rate constant 


(13-28) 


Upon initiation of the reaction, the equilibrium between 
the unfolded state and the hydrophobically collapsed 
state would be established immediately, and the molten 
globule would appear in a kinetically first-order reaction. 
If the equilibrium constant between the unfolded state 
and the hydrophobically collapsed state is less than 1, no 
purely hydrophobically collapsed state should be 
observed, and none is. 

It is reasonable that this equilibrium constant 
should be less than 1. Few if any polypeptides should be 
able to bury enough hydrogen-carbon bonds upon their 
hydrophobic collapse to overcome both the unfavorable 
loss of the configurational entropy of the random coil and 
the unfavorable loss of solvation arising from the 
inescapable transfer of donors and acceptors of hydrogen 
bonds from the water into the interior of the collapsed 
state. Certainly the small values for standard free energy 
of folding (Table 13-2) suggest that this must be the case. 
Many of these unbonded donors and acceptors buried 
during the collapse, however, do become occupied within 
the secondary structure of the molten globule when it 
forms. These buried hydrogen bonds within o helices and 
p structure of the molten globule, because they are 
formed in the absence of water, stabilize it relative to a 
hydrophobically collapsed state lacking any internal 
hydrogen bonds. Consequently, the molten globule with 
its characteristic secondary structure can be the first 
intermediate observed even though the mechanism of 
folding passes obligatorily through a hydrophobically col- 
lapsed state lacking any secondary structure. 

This explanation, however, is inconsistent with the 
observation that the reciprocals of the observed rate con- 


stants for the formation of the molten globular interme- 
diates seem to be linearly related to the viscosity of the 
solvent.” Because it is a state function, the equilibrium 
constant for collapse cannot be affected by the viscosity 
of the solvent. Because neither Kepse nor E: should be 
affected by viscosity, the rate-limiting step in the forma- 
tion of these intermediates cannot be the formation of sec- 
ondary structure within an already collapsed state. It 
could, however, be the collapse of an expanded state 
already containing sufficient secondary structure to sta- 
bilize the collapsed state, much as the expanded acid- 
denatured form of cytochromec with 20% «ahelix 
collapses upon a jump in pH?” or as the expanded cold- 
denatured form of barstar, which also contains significant 
residual secondary structure,” collapses upon a jump in 
temperature. If this is the mechanism for the collapse of 
an expanded denatured state, the secondary structure that 
will stabilize the molten globule relative to the hydropho- 
bically collapsed state forms first within the random coil 
and is then trapped by the collapse of this structured dena- 
tured state to form the molten globule directly. 

There are indications that, in the unfolded state of 
a protein in solutions of guanidinium ion or urea, 
metastable segments of secondary structure form and 
dissolve continuously even though they are not present 
at significant concentrations.” During the folding of 
bovine acyl-CoA-binding protein at 0.5 M guanidinium 
chloride,” an intermediate forms during a kinetic burst 
in which considerable protection is afforded to the 
exchange of amido protons, but this protection seems to 
result from the formation of a set of relatively stable con- 
formations of the uncollapsed polypeptide that contain 
elements of secondary structure. It is difficult, however, 
to determine just how uncollapsed these conformations 
are. Clusters of secondary structure in either evanescent 
or more stable conformations of the uncollapsed 
random coil could be trapped during its collapse to form 
a molten globule directly. A random coil is an ensemble 
of conformations in rapid equilibrium with each other, 
and subsets of those conformations may contain ele- 
ments of secondary structure waiting to be enclosed. 

In this case, the preequilibrium would be not 
hydrophobic collapse (Equation 13-25) but the forma- 
tion of these fleeting unstable elements of secondary 
structure: 


k’ k’, 
U PS 2° —> MG (13-29) 
LI 


where 2° is any subset of the conformations of the 
random coil containing secondary structure extensive 
enough and appropriately located to support the stable 
formation of a molten globular intermediate. The forma- 
tion of this molten globule would be a first-order reaction 
(Equation 13-26). If k’ > ky, then the observed rate con- 
stant for the mechanism of Equation 13-29 is 
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kops = ——— 
obs l+ Kye 


(13-30) 


where K» is the equilibrium constant for the formation of 
the secondary structure within the random coil the pres- 
ence of which is required before collapse can occur. No 
secondary structure would be observed in the random 
coil because the equilibrium constant for its formation, 
Kae, would be significantly less than 1. If K)» << 1, then the 
observed rate constant is KK. The formation of the 
molten globular intermediate would exhibit an apparent 
energy of activation that is the sum of the actual standard 
free energy of activation for the reaction governed by rate 
constant k,’ and the change in standard free energy for 
the reaction governed by the equilibrium constant K; for 
the preequilibrium in which the secondary structure is 
formed. 

The decision between a mechanism in which 
hydrophobic collapse precedes any formation of second- 
ary structure (Equation 13-25) and a mechanism in 
which the formation of sufficient secondary structure to 
stabilize the collapsed state precedes hydrophobic col- 
lapse (Equation 13-29) depends on the interpretation of 
the effects of solutes that increase the viscosity of the sol- 
vent on the observed rate constants for the formation of 
the molten globular intermediates. If the effects of these 
solutes are on the stability of intermediates in the 
process of folding’” rather than exclusively on the vis- 
cosity,’”* this decision is ambiguous. It is often impossi- 
ble to distinguish these two possibilities experimentally. 

The transition between one of the molten globular 
intermediates formed during a kinetic burst and the final 
native state of a protein appears to occur in one or sev- 
eral consecutive kinetic steps. For example, the molten 
globular kinetic intermediates of apomyoglobin,””’ intes- 
tinal fatty acid binding protein,’ and the B1 domain of 
protein CTT become the respective native state in appar- 
ently single first-order steps with rate constants of 1 s™ 
(5°C), pel, and 600 s™ (20°C). The molten globular 
intermediate of typel ribonucleaseH becomes the 
native state in what appears to be a single first-order step 
with a rate constant of 0.6 s™ at 25 °C when monitored by 
molar ellipticity at 220 nm and 292 nm but an additional 
faster step of 2s is detected by fluorescence (Figure 
13-9). This latter result illustrates the fact that some steps 
in the folding of a protein go unregistered by certain 
physical measurements. 

The apparently single steps that occur following the 
rapid formation of an intermediate during a kinetic burst 
and that in turn produce the native state often seem to 
involve only a portion of the entire protein. The folding 
of the rest of the protein must occur either during the 
kinetic burst or during rapid unregistered steps following 
the rate-determining steps and the rate-limiting step that 
produce this slow observed rate constant. For example, 
some site-directed mutations of ribonuclease from 
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B. amyloliquifaciens affect the observed rate constant for 
the production of the native state from the intermediate 
formed during the kinetic burst, while others affect the 
stability of that intermediate.” The observed rate con- 
stant (20s at 20°C) for the principal relaxation 
observed in both molar ellipticity and fluorescence fol- 
lowing the kinetic burst during the folding of lysozyme 
from bacteriophage T4 is affected significantly by site- 
directed mutations in the carboxy-terminal half of the 
polypeptides but not by mutations in the amino-termi- 
nal half even though the mutations in both halves affect 
the stability of native state.” Presumably the process 
being registered as an apparent single step by both cir- 
cular dichroism and fluorescence is the folding of only 
the one half of the protein and not the other. 

It is more common to observe two or more steps 
rather than just one during the transition between a 
kinetic molten globular intermediate and the native state. 
For example, four additional steps can be discerned fol- 
lowing the formation of the first kinetic intermediate 
during the folding of B-lactoglobulin;*” two, during the 
folding of human lysozyme; two, during the folding of 
cytochrome c;**°”°! and two, during the folding of micro- 
coccal nuclease from S. aureus.’ Again, the number of 
phases detected often depends on how many spectral 
properties have been monitored. For example, five addi- 
tional steps in the refolding of dihydrofolate reductase 
from E. coli were discerned following the formation of a 
molten globular intermediate if molar ellipticity at 
220 nm, molar ellipticity at 235 nm, absorbance, and 
intrinsic fluorescence were all monitored.” 

Some if not most of these multiple steps may each 
involve the assembly of a particular portion of the final 
secondary structure of the native ` state.??” 
Intermediates formed after the formation of the initial 
molten globule, however, often have all ofthe secondary 
structure of the native state, but some of that secondary 
structure is in a less stable state than it will eventually 
assume upon complete folding,” as if the rigid con- 
formation of the native structure maintained by the 
proper packing of the secondary structures locks in place 
only in the final step or one of the final steps in the 
process. The last or almost the last property to appear in 
the folding of a protein is the spatial arrangement of the 
constellation of side chains responsible for its func- 
tion 22828 

The order in which different elements of secondary 
structure form and lock into place during the transition 
from the initial molten globular intermediate to the final 
native state can be inferred from studies of native-state 
proton exchange. The amido protons of the peptide 
bonds buried in the native state of a protein and defining 
its secondary structure exchange with deuterons in the 
solvent at different rates. When standard free energies for 
the conformational equilibria leading to their exposures 
are followed as a function of the concentration of a 
denaturant, it is observed that they generally coalesce 


into groups as the concentration of denaturant is 
increased (Figure 13-14).'"°"!” These groups are groups 
of amido protons involved in particular secondary struc- 
tures in the native protein. For example, in the crystallo- 
graphic molecular model of cytochrome c (Figure 7-9), 
Arginine 91 through Lysine 100 (Figure 13-14A) are 
within one o helix; Methionine 65 through Asparagine 70 
(Figure 13-14B) are within another œ helix; and the 
e amido proton of Tryptophan 59 and the a amido pro- 
tons of Leucine 64, Lysine 60, Phenylalanine 36, and 
Glycine 37 are within a cluster of adjacent hydrogen 
bonds (Figure 13-14C). The coalescence of the respective 
sets of standard free energies of exposure indicate that 
each of these elements of secondary structure opens to 
exchange its amido protons cooperatively. 

There are indirect observations suggesting that 
these openings of the structure, which are occurring all 
the time in the native protein, actually occur sequen- 
tially.“ For example, the cluster of hydrogen bonds 
involving the œ amido proton of Lysine 60 must open 
before the a helix containing Leucine 68, which must 
open before the o helix containing Leucine 98 during the 
exchange of the amido protons in the latter œ helix. 
Furthermore, as was the case with the exchange of the 
amido proton at Methionine 47 in cysteineless type I 
ribonuclease H from E. coli (Figure 13-8), the exchange 
of the amido protons of the secondary structure that is 
the last to open, namely the o helix containing Leucine 
98 in cytochrome c, tracks the global unfolding of the 
protein. 

All of these observations suggest that the exchange 
of amido protons in the native state of a protein reveal, in 
reverse, the steps in the formation of secondary struc- 
ture and the locking in of that secondary structure as 
the elements pack next to each other during the normal 
folding of the protein. In other words, during the folding 
of cytochrome c, the œ helix containing Leucine 98 forms 
before that containing Leucine 68, which forms before 
the cluster of hydrogen bonds involving the a amido 
proton of Lysine 60. 

The slowest steps in the folding of a protein from its 
random coil are often the isomerizations of peptide 
bonds to the amino-terminal side of prolines. In the 
crystallographic molecular models of proteins, about 6% 
of the peptide bonds on the amino-terminal sides of pro- 
line are cis peptide bonds***”® and the rest are trans pep- 
tide bonds (Equation 6-1). These are geometric isomers 
of each other. The peptide bond on the amino-terminal 
side ofa proline at a particular position in the amino acid 
sequence of a particular protein usually will be either cis 
in every molecule of the native state or trans in every 
molecule of the native state. In the random coil, however, 
every proline is free to adopt either geometric isomer, 
and the cis and trans isomers slowly come to equilibrium. 
In dipeptides, the equilibrium constants between cis and 
trans isomers of proline vary with pH, but they fall 
between 10 and 1.5 in favor of the trans isomer. The more 


prolines that must be one or the other isomer before the 
native state can be achieved, the more random coils with 
at least one incorrect isomer of proline will be present in 
the solution. 

When bovine ribonuclease A is added to 5 M guani- 
dinium chloride at pH 2.3, the unfolding of the polypep- 
tide is very rapid (<10 s), and the unfolding produces a 
random coil, cross-linked by its four cystines.'”' If the 
solution containing this random coil is diluted within 
15 sto 1.3 M guanidinium chloride, pH 6.4, at 25 °C, all of 
the polypeptide (> 95%)'*' refolds to the native state, 
capable of full enzymatic activity,“ in an uncomplicated 
first-order relaxation’ with a rate constant of about 
10 s™. This result demonstrates that a random coil that a 
moment ago was native ribonuclease can refold rapidly, 
and with no obvious complications, back into native 
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folding form and the slowly folding forms of the random 
coil of ribonuclease consistent with them being due 
entirely to isomerization of peptide bonds on the amino- 
terminal sides of prolines in the sequence. The observed 
rate constant for the approach to the equilibrium 
between the rapidly folding form and slowly folding 
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ribonuclease. 

If rapidly unfolded ribonuclease, however, is 
allowed to sit as a random coil in a solution of guani- 
dinium chloride over a period of 10 min, the kinetics of 
refolding are split into several phases, one with the same 
observed rate constant as that of the initially produced 
random coil and several others that are much slower. 
Consequently, a portion of the initially produced 
random coil that all refolds rapidly has isomerized to 
random coils that refold slowly. At equilibrium, the rap- 
idly folding isomer of the random coil accounts for 20% 
of the protein; the slowly folding isomers of the random 
coil, for 80%; and the rate constant for the approach to 
this equilibrium between the rapidly folding isomers and 
the slowly folding isomers is 0.005 s™ at 25 °C.'?! 

Because the zsystem of the amide (Figure 2-3) 
must be broken for conversion to occur between the cis 
and trans geometric isomers of a peptide bond, this con- 
version is a slow process. The rate constants for the 
approach to the equilibrium between the cis and trans 
isomers of a set of dipeptides containing a carboxy-ter- 
minal proline are slow, between 0.002 s and 0.005 s at 
25 °C above pH 3,”®?® rates that are similar to observed 
rate constant for the approach to equilibrium of the rap- 
idly and slowly folding forms of ribonuclease. 

In addition to this similarity of rates, there are sev- 
eral properties of the transitions between the rapidly 
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Figure 13-14: Rates of exchange of particular amido protons along the polypeptide backbone of native equine cytochrome c as a function 
of the concentration of guanidinium chloride.''® Equine cytochrome c was dissolved at p*H 7 in H,O at the noted concentrations of guani- 
dinium chloride. After different intervals, samples were removed, the pH was adjusted to 5 to slow the rates of exchange, and a two-dimen- 
sional nuclear magnetic resonance spectrum was gathered. The amplitudes of each of the peaks in each of the respective spectra were 
tabulated as a function of the time spent at p*H 7 in *H,O before the pH was lowered, and rates of exchange were calculated from the 
decreases in these amplitudes as a function of time. Each exchange was at the EX, limit, and the equilibrium constant for the formation of 
the conformation exposing each amido proton at each concentration of guanidinium chloride was calculated (Equation 12-63). From these 
equilibrium constants, standard free energies for the exposures of each proton, AG°yx, were calculated. These standard free energies of expo- 
sure (kilojoules mole”) are plotted as a function of the concentration (molar) of guanidinium chloride (GdmCl). The standard free energies 
of exposure coalesced into specific groups as the concentration of guanidinium chloride was raised. Separate plots for three of these groups 
are presented: (A) the amido protons in the œ helix containing Arginine 91 (R91) to Lysine 100 (K100); (B) the a helix containing Methionine 
65 (M65) to Asparagine 70 (N70); and (C) a cluster of hydrogen bonds containing the e amido proton of Tryptophan 59 and the o amido pro- 
tons of Leucine 64 (L64), Lysine 60 (K60), Phenylalanine 36 (F36), and Glycine 37 (G37). In each successive panel, the lines for the most exten- 
sively protected amido protons from the previous plots are drawn as a dashed line to identify the discrete groups. Reprinted with permission 
from ref 116. Copyright 1995 American Association for the Advancement of Science. 
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forms of the random coil” and the observed rate con- 
stant for the formation of the native state from the slowly 
folding forms of the random coil of ribonuclease’ are 
both increased by strong acid, as are the rate constants 
for the cis-trans isomerization in dipeptides of proline.’* 
The standard enthalpy of activation for the approach to 
the equilibrium between the rapidly folding form and the 
slowly folding forms of the random coil is between 75 
and 90 kJ mol”, either at low pH or in 5 M guanidinium 
chloride,”® which compares favorably to the values for 
the standard enthalpy of activation (80-90 kJ mol) for 
the cis-trans isomerization of dipeptides of proline.”®® 
Neither the rate of the approach to cis-trans equilibrium 
of dipeptides of proline nor the approach to the equilib- 
rium between the rapidly folding form and the slowly 
folding forms of the random coil is affected by the con- 
centration of guanidinium chloride.” 

In the crystallographic molecular model of bovine 
ribonuclease A, the peptide bonds on the amino-termi- 
nal sides of Proline 93 and Proline 114 are cis peptide 
bonds. The amount of the peptide bond amino-terminal 
to Proline 93 that is in the cis form in the random coil can 
be monitored by its insensitivity to endopeptidolytic 
cleavage by Xaa-Pro dipeptidase.”®”’ In 8.5M urea at 
10 °C, 70% of this peptide bond is cis. When the urea is 
diluted to 0.3 M, the 30% of this peptide bond that is 
trans slowly and completely reverts to the cis isomer with 
arate constant of 0.01 s"', as the polypeptide folds. Under 
these conditions, 30% of the random coil refolds in the 
slowest phase and 30% of the activity of the enzyme is 
regained in this slowest phase, both with a rate constant 
of 0.01 e. Proline 93 is preceded by Tyrosine 92, and the 
fluorescence from this tyrosine tracks the slow isomer- 
izations that produce fully native enzyme during the 
refolding of the equilibrated random coil after a decrease 
in the concentration of guanidinium chloride.”®’ It was 
presumed that the slow process monitored by the fluo- 
rescence of Tyrosine 92 is the state of isomerization of 
the peptide bond between Tyrosine 92 and Proline 93. 

If both Proline 93 and Proline 114 are mutated, the 
former to an alanine and the latter to a glycine, the sta- 
bility of the native protein is decreased significantly, but 
its random coil is still able to fold to produce a protein 
that is enzymatically active.” The folding of this double 
mutant, when monitored by molar ellipticity, is a single 
first-order reaction with a rate constant of 0.07 s™ in 
0.4 M guanidinium chloride at 10 °C with no evidence for 
any slower phase. 

It has been concluded from all of these observa- 
tions that all of the slow isomerizations of the random 
coil of bovine ribonuclease A that in turn produce the 
slowly folding forms from the rapidly folding form are 
isomerizations of peptide bonds on the amino-terminal 
sides of prolines from the isomer found in the native 
state and that the most disruptive isomerization of the 
random coil is to a trans proline at position 93. When the 
equilibrium mixture of rapidly folding and slowly folding 


random coils of ribonuclease is diluted into conditions 
favorable to folding, some of the slowly folding random 
coils assume nativelike conformations in which the crit- 
ical peptide bonds on the amino-terminal sides of pro- 
lines are nevertheless the incorrect isomer. One 
intermediate, I,, is sufficiently folded to trap almost 20 
amido protons in stable hydrogen bonds,” and another, 
Iy, is compactly folded by several criteria,” including its 
insensitivity to digestion by pepsin.“ These compact 
intermediates, however, differ from the native state and 
can be distinguished from it by having the incorrect iso- 
mers at particular prolines.*” 

A similar but more dramatic effect of the slow iso- 
merizations of prolines is observed in the folding of 
ribonuclease T; from A. oryzae. In the crystallographic 
molecular model of this even shorter protein (104 aa), the 
peptide bonds preceding both Proline 39 and Proline 55 
are cis.” If the native protein is unfolded in 6.0 M guani- 
dinium chloride, pH 1.6, and the resulting random coil is 
diluted after 5 s to 1.0 M guanidinium chloride, pH 5.0, 
80% refolds in a single first-order relaxation with a rate 
constant of pe" "As it sits in 6.0 M guanidinium chlo- 
ride, however, the percentage of the fast-folding isomer 
decreases to 3% and the approach to this equilibrium has 
arate constant of around 0.05 s™. From an analysis of the 
kinetics of this loss of the rapidly folding state of the 
random coil in both wild-type protein and protein in 
which Proline 55 had been mutated to asparagine, it 
could be concluded that the random coil with both pro- 
lines in the cis isomer folds to the native state with full 
enzymatic activity at 6 s™, that the isomerization of cis- 
Proline 55 to trans-Proline 55 has a rate constant of 
0.05 el and a cis-trans equilibrium constant of 0.16, and 
that the isomerization of cis-Proline 39 to trans-Proline 
39 has a rate constant of 0.02 s™ and a cis-trans equilib- 
rium constant of 0.1. At equilibrium, 78% of the random 
coils have both prolines in the trans conformation. 

Nevertheless, when this mixture of geometric iso- 
mers of the random coil is diluted to 1 M guanidinium 
chloride, pH 5.0, at least 70% of the random coils collapse 
at a rate of 50 s™ to molten globules in which the central 
B sheet has formed?” and the «helix of the native state 
then forms within these molten globules with a rate con- 
stant of 20 s™. When the same equilibrium mixture is 
diluted to 0.15 M guanidinium chloride, the entire far- 
ultraviolet circular dichroic spectrum of the native pro- 
tein is regained in a few seconds" The resulting 
condensed, molten globular states, however, do not 
become native protein until both Proline 55 and Proline 
39 have become cis. The isomerizations that produce the 
proper cis isomers, and hence the native state, proceed in 
these molten globular states with rate constants between 
0.01 s™ and 0.0003 s” at 10 °C (Figure 13-15A).”” The iso- 
merization of Proline 39 is retarded significantly by the 
formation of these partially folded molten globules, and 
this retardation is in part responsible for the slowest rate 
constant of 0.0003 s” for 66% of the protein.” 


In the laboratory, the foldings of many polypeptides 
have slow phases with rate constants between 0.1 s™ and 
0.002 e at 25 °C that are attributed to proline isomeriza- 
tion. ?°%740,286,301306 Many of these attributions have been 
validated by demonstrating that when particular pro- 
lines in the polypeptide are eliminated by site-directed 
mutation, the slow phases dieappear "22201 8 A prob- 
lem with this approach is that if the proline that must be 
mutated is critical to the structure of the protein, in par- 
ticular if it is cis in the native state, its removal often 
destabilizes the protein significantly and alters the kinet- 
ics of folding.” For example, the double mutant of 
bovine ribonuclease A folds about 100-fold more slowly 
than the wild-type polypeptide with both Proline 93 and 
Proline 114 in the cis isomer.” 

In the folding of small proteins that contain only 
one or two domains, most if not all of the steps that have 
rate constants less than 0.1 s™ result from required iso- 
merizations of one or more prolines. The rate constants 
for the isomerization of proline between cis and trans 
isomers in a random coil are in the range from 0.1 s™ to 
0.002 s™ at 25 °C.” The isomerization of trans to cis is 
always slower by a factor of 2-10 because of the equilib- 
rium constants, so if the proline is cis in the native state, 
which because of its peculiar geometry is usually the 
more critical for proper folding, any step during folding 
that involves the formation of that cis isomer will be quite 
slow (Figure 13-15). 

A related set of isomerizations proceed at a more 
rapid rate than those for cis-prolines. In the fully equili- 
brated random coil, about 0.0015 of the peptide bonds 
amino-terminal to amino acids other than proline are in 
the cis conformation.” Although this is a small fraction, 
in a protein with 100 amino acids only 86% of the random 
coils will be all trans at equilibrium. If the native state has 
a cis peptide bond amino-terminal to an amino acid 
other than proline or if a significant percentage of its 
positions cannot tolerate cis peptide bonds during its 
folding, a fraction of the random coils will be in geomet- 
ric isomers incapable of folding to the native state. The 
monocationic or monoanionic forms of dipeptides 
approach the equilibrium between their cis and trans 
isomers at a rate constant of about 1 s™! at 25 °C," anda 
slow phase with a rate constant of 2.5 s at 25 °C involv- 
ing 5% of the random coils during the folding of &-amy- 
lase inhibitor HOE-467A (n,, = 74) from Streptomyces 
tendae has been attributed to random coils in which crit- 
ical peptide bonds are in the incompatible cis isomer.”” 

Most of the time, the isomerizations of only a few 
prolines, usually to the cis isomer, are required steps in 
the complete folding of a protein. The isomerizations of 
many of the peptide bonds amino-terminal to prolines in 
a protein have no effect on its kinetics of folding,*"° and 
the folding of a number of proteins show no slow phases 
that result from proline isomerization.” In fact, the 
native states of some proteins have two conformations in 
slow equilibrium with each other, one in which a proline 
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is in the cis isomer and the other in which it is in the trans 
isomer.*”**"' Because the slow isomerizations of the pro- 
lines significantly complicate the kinetics of folding, 
most of the proteins chosen for detailed studies of fold- 
ing are those that do not have prolines that have to iso- 
merize. 

This choice is appropriate because in the cytoplasm 
and the extracytoplasmic spaces in which all polypep- 
tides normally fold, as opposed to the laboratory, the iso- 
merizations between the cis and trans isomers of the 
peptide bonds to the amino-terminal sides of prolines 
are catalyzed by peptidylprolyl isomerases. The first 
enzyme with this catalytic activity that was purified to 
homogeneity”! was assayed by its ability to catalyze 
the cis-trans isomerization in N-glutaryl-Ala-Ala-Pro- 
Phe 4-nitrophenylanilide.”'* This particular enzyme is 
able to increase the rates of the slowest phases in the 
refolding of, among other proteins, ribonuclease A 22718 
the light chain of immunoglobulin,*’’ human acylphos- 
phatase,*” and type III collagen 

In most of these instances, the increases observed 
in the rate constants for these slowest phases in folding 
were relatively unremarkable (less than a factor of 10) 
even at high concentrations of the peptidylprolyl iso- 
merase. It has been found, however, that there are many 
different isoforms of peptidylprolyl isomerase in a given 
organism; for example, there are at least 25 different pep- 
tidylprolyl isomerases encoded by the human genome, 
and several are often found within the same cell.*’?"”° It 
is possible that if the folding protein were matched with 
its proper peptidylprolyl isomerase under conditions 
resembling those encountered by the folding protein in 
the cytoplasm, much more effective rates of catalysis 
would be observed, but peptidylprolyl isomerases from 
bacteria, fungi, and animals are about as effective when 
catalyzing the folding of the same protein.”' Within an 
intact mitochondrion, the increase in the rate of folding 
catalyzed by endogenous peptidylprolyl isomerase was 
observed to be a factor of only 2-6.°° It is possible that 
the effect of peptidylprolyl isomerases on the rate of fold- 
ing in the cytoplasm and the various extracytoplasmic 
spaces is usually modest. 

Peptidylprolyl isomerases have been isolated from 
bacteria,’ fungi,” and plants?” as well as from animals. 
The peptidylprolyl isomerase associated with the ribo- 
some in E coli appears to be one of the most effective P 

The extent of exposure of the peptide bonds 
amino-terminal to proline is an important factor in the 
efficiency with which peptidylprolyl isomerases can 
function” because they must be able to find the bond 
before they can isomerize it. For example, peptidylprolyl 
isomerase is able to catalyze the faster of the isomeriza- 
tions of proline that must occur during the folding of 
ribonuclease T, to its native state, but not the slowest 
(Figure 13-15). This slowest step involves the isomeriza- 
tion of Proline 39, which has already been slowed con- 
siderably from its rate in the random coil” by being 
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Figure 13-15: Slow steps in the folding of ribonuclease T, from 
A. oryzae due to isomerization of the peptide bonds amino-termi- 
nal to prolines.?”° Unfolded ribonuclease T; at 0.3 mM in (A) 6.0 M 
guanidinium chloride, pH 1.9, or (B) 8 M urea, pH 8.0, was diluted 
40-fold with 0.1M tris(hydroxymethyl)jammonium chloride, 
pH 8.0, to initiate folding. At various times, the percentage of the 
protein in its native state was determined by monitoring the 
decrease in fluorescence (emission at wavelengths greater than 
320 nm; excitation at 268 nm) that occurred upon dilution of the 
sample to 5.6 M guanidinium chloride, pH 2.0. (A) Refolding in the 
absence of prolyl isomerase. (B) Refolding in the presence of 
0.7 uM prolyl isomerase. The percentage of the molecules in their 
native state is plotted as a function of time (minutes). The curves 
are multiexponential fits to the data with first-order rate constants 
of (A) 0.01 s', 0.005 s™, 0.002 s (31%), and 0.0003 s (66%) and 
(B) 0.1 s™, 0.04 s*, 0.5 s™ (46%), and 0.0013 e" (51%). For both fits 
an unresolved kinetic burst of 3% was assumed. Reprinted with 
permission from ref 299. Copyright 1990 American Chemical 
Society. 


buried in a molten globular intermediate.*” In the 
molten globules formed during the folding of isoform 2 
of cytochrome c from S. cerevisiae, peptidylprolyl iso- 
merase is unable to hasten any of the slow isomerizations 
at prolines in the final steps, even though it can if the 
equilibrium between molten globule and unfolded forms 
is shifted significantly by adding guanidinium chlo- 
de TH How this problem of accessibility is solved in the 
cytoplasm of a cell is unclear. 

There are a number of small proteins or small 
domains removed from larger proteins that fold in what 


kinetically appear to be single steps with the formation of 
no intermediates. One example of such a protein is the 
cold shock-like protein from T. maritima (Figure 
13-1).'%°® Even at 0.5M guanidinium chloride, this 
small protein (na = 66) folds in what kinetically appears 
to be a simple first-order isomerization, and there is no 
obvious change in rate-limiting step observed in the plot 
of the observed first-order rate constant for folding as a 
function of the concentration of guanidinium chloride 
(Figure 13-1B). Folding with no kinetic intermediates 
has been observed for the competent proline isomer of 
the random coil of chymotrypsin inhibitor 2A (naa = 83) 
from Hordeum vulgare,” for which the observed first- 
order rate constant for folding shows no evidence of a 
change in rate-limiting step even in the absence of any 
denaturant, as well as for, among others, human 
acylphosphatase (na = 98), protein S6 from the 
30S subunit of ribosomes from Thermus thermophilus 
(na = 101, the engrailed homeodomain from 
Drosophila melanogaster (nq = 61),*** and human ubiq- 
uitin (Naa = Gen 

It is also possible to follow even faster folding reac- 
tions that exhibit apparently two-state behavior in the 
region of transition down to low concentrations of denat- 
urant by analysis of the line widths of the absorptions or 
analysis of relaxation rates in nuclear magnetic reso- 
nance. In this way, it can be demonstrated for some pro- 
teins and the detached domains of other proteins that 
there is no evidence of a change in rate-limiting step and 
that their folding remains kinetically a one-step reaction 
even in the absence of denaturant. Such demonstrations 
have been made for the amino-terminal domain of 
the repressor from bacteriophage A (naa = 80), > the 
B-domain of protein A from S. aureus (Nag = Sp. ID and 
the peripheral subunit binding domain of dihydrolipoylly- 
sine-residue acetyltransferase of B. stearothermophilus 
(Mag ALT 

All of these foldings appear to occur in a kinetically 
single step because there is no change in the rate-limit- 
ing step as denaturant is decreased to zero. In the fold- 
ings in which there is a change in the rate-limiting step 
(Figure 13-10), it is slower steps unaffected or less 
affected by the concentration of denaturant, such as the 
progression of the formation of secondary structure 
within the molten globular intermediate, the final lock- 
ing together of the secondary structures into their native 
packing, and isomerizations of peptide bonds amino- 
terminal to prolines, that become slower than the initial 
condensation of the random coil to the molten globule as 
the concentration of denaturant is lowered. In the fold- 
ings that appear to proceed in a single kinetic step, these 
later steps simply remain faster than the initial conden- 
sation; they still must occur, but they are kinetically silent 
because they occur after the rate-limiting step. 

The observed rate constant for the first-order relax- 
ation of one of these foldings that appear to proceed ina 
single kinetic step is determined by the rate constant of 


the rate-limiting step, the rate constants of any rate- 
determining steps preceding the rate-limiting step, and 
the equilibrium constants of any unfavorable preequilib- 
ria that precede the rate-limiting step (Equations 13-28 
and 13-30). Favorable preequilibria preceding the rate- 
limiting step would require the formation of observable 
intermediates in the reaction. 

The following facts indicate that, immediately upon 
completion of the rate-limiting step in one of these fold- 
ings that appears to proceed through a single kinetic 
step, a molten globular intermediate containing signifi- 
cant secondary structure has formed from the random 
coil. The observed rate constants of these foldings are 
significantly affected by the concentration of denaturant 
(Figure 13-1B); they increase by factors as large as 10,000 
between a solution containing a concentration of denat- 
urant within the region of transition and one containing 
no denaturant.”' This fact requires that considerable 
accessible surface area be lost either in the transition 
state of the rate-limiting step, during rate-determining 
steps preceding the rate-limiting step, or during preequi- 
libria preceding the rate-limiting step. The apparent 
molar activation volume for one of these foldings is pos- 
itive and even larger than the positive change in molar 
volume between random coil and native state,” an 
observation indicating that the transition state for the 
rate-limiting step or for rate-determining steps that pre- 
cede it is globular or that an intermediate formed in an 
unfavorable preequilibrium is globular, as is the native 
state, but is somewhat less compact than the native state. 
All? or most” of the change in standard heat capacity 
between random coil and native state has occurred by 
the time the transition state in the rate-limiting step 
appears, an observation indicating that the hydrophobic 
functional groups buried in the native state are buried 
during the rate-limiting step or before it. The effect of pH 
on the observed first-order rate constant for one of these 
foldings indicates that some carboxylates have been 
sequestered from the solvent, but far fewer than those 
sequestered in the native state, by the time the transition 
state of the rate-limiting step has formed.” In the case of 
the carboxy-terminal domain of protein L9 from the 50S 
subunit of the ribosome from E. coli, the mutation of 
Histidine 134, the histidine in the protein with the lowest 
pK, in the native state, caused the most dramatic change 
in the dependence of the rate constant ke on pH, a result 
suggesting that this same histidine is also the most 
buried by the time the transition state of the rate-limiting 
step has formed and that at this point the protein resem- 
bles the native state (D? 

It is the effects of site-directed mutations on the 
observed rate constants of these foldings appearing to 
proceed in a single kinetic step which indicate that sec- 
ondary structure has formed before or during the rate- 
limiting step. For example, when one or the other of the 
two a helices in acylphosphatase is stabilized by mutat- 
ing one of its amino acids to alanine, the observed rate 
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constant for its folding increases if the mutation is in the 
second o helix but not if it is in the first.**” When the two 
a helices of transcriptional repressor arc from bacterio- 
phage P22 were destabilized by mutations of alanines to 
glycines, the observed rate constants of folding 
decreased significantly,” while alanine to glycine muta- 
tions in only one of the w helices in the amino-terminal 
domain of the repressor from bacteriophage A affect the 
observed rate constant of its folding.” The magnitudes 
of the changes in the observed rate constants of folding 
relative to the change in the standard free energy of fold- 
ing for mutations at 37 positions in chymotrypsin 
inhibitor 2A suggest that some elements of secondary 
structure have formed by the time the transition state of 
the rate-limiting step has been reached but that they are 
less stable than they are in the native state.” The effects 
of site-directed mutation on the folding of the WW 
domain from peptidylprolyl isomerase Pin 1, however, 
indicate that its folding is most sensitive to changes in 
what is an external loop in its crystallographic molecular 
model" an observation suggesting that secondary 
structure in the native state may not correspond to sec- 
ondary structure in a molten globular intermediate. 

All of the observations discussed so far can be 
explained as effects either on the rate-limiting step or on 
one or more unfavorable preequilibria preceding the 
rate-limiting step. It is the case, however, that the 
observed rate constant for at least one of these foldings 
that appears to proceed in a single kinetic step is linearly 
related to the inverse of the viscosity of the solvent.” 
This fact requires that the condensation of a conforma- 
tion or set of conformations of the random coil occur 
during the rate-limiting step, not before it. This conclu- 
sion follows from the fact that the viscosity of the solvent 
must affect both contraction and expansion of a 
polypeptide equally and hence an equilibrium constant 
for contraction not at all. 

The observed rate constants for these foldings that 
appear to proceed in a single kinetic step span a wide 
range from 0.2 el (28 °C) to 120,000 s (37 °C). The fact 
that these rate constants span such a wide range and the 
fact that nevertheless they all seem to register condensa- 
tions producing molten globular intermediates suggest 
that their rate-limiting steps are not hydrophobic col- 
lapse, for which the observed rate constant would be 
related solely to the length of the polypeptide and would 
not vary so dramatically,’ but hydrophobic collapse of 
a conformation or set of conformations of the random 
coil that are present at low occupancy; the lower that 
occupancy, the slower the observed rate constant 
(Equation 13-30). Presumably, the subset of conforma- 
tions of the random coil that are present at such low 
levels of occupancy are those that contain sufficient sec- 
ondary structure to condense to a stable molten globule. 
Consistent with this presumption is the fact that the 
observed rate constant for at least one of these foldings 
that appears to occur in a single step is significantly 
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increased by the addition of low concentrations (0.006 
mole fraction) of trifluorethanol or 1,1,1,3,3,3-hexaflu- 
oro-2-propanol.**° These cosolvents are known to stabi- 
lize secondary structure in unfolded polypeptides. 

The apparent single step registered in each of these 
foldings is equivalent to the step producing the molten 
globular intermediates formed during the kinetic burst in 
the foldings of proteins that proceed through multiple 
steps. In the case of the proteins folding in an apparent 
single step, the steps following the molten globular inter- 
mediate are kinetically silent. The advantage of the fold- 
ings in a single step is that attention can be directed 
exclusively on this one step, uncomplicated by the later 
ones; the disadvantage is that the later steps cannot be 
studied at all. 

What seems to occur during folding of a polypep- 
tide can be summarized. The random coil has conforma- 
tions within its ensemble that are evanescently present at 
low concentrations and that contain sufficient secondary 
structure to stabilize a molten globular intermediate. 
One or more of these minor conformations of the 
random coil collapses hydrophobically to form one or 
more molten globules.* Within these molten globules, 
some of the secondary structure of the native state is 
present and the rest forms in a series of steps before or as 
it locks into its proper packing to produce the native 
state. Superimposed on this progression are the slow iso- 
merizations of the peptide bonds amino-terminal to pro- 
lines, each of which has the potential to decrease the 
observed rate constant of any one of these steps dramat- 
ically but only for those isomers of the polypeptide that 
happen to contain an incompatible isomer of proline. If 
isomerizations of peptide bonds amino-terminal to pro- 
lines were not or are not involved, the folding of a 
polypeptide would be or is usually complete within less 
than 10s at 25°C, which is remarkably fast for such a 
complicated process. 

Up to this point, for the sake of clarity, the transfor- 
mations between each of the major intermediate states 
encountered during the folding of a protein have been 
presented as if they were uncomplicated first-order reac- 
tions proceeding in single steps just as the bimolecular 
nucleophilic displacement of an iodide in the alkylation 
of a thiolate anion (Equation 3-17) is an uncomplicated 
second-order reaction proceeding in a single step 
through a single transition state. It is certainly the case, 
however, that in a process as complicated as protein 
folding none of these transformations between the dena- 
tured state, major intermediate states, and the final 
native state proceeds in a single step. Consequently, the 
transformations only appear to be single steps, and the 


* If the effect of solutes increasing the viscosity of the solution have 
been misinterpreted, it is also possible that hydrophobic collapse 
is the unfavorable preequilibrium preceding the formation of suf- 
ficient secondary structure to stabilize the molten globule or 
molten globules. 


observed rate constants are only apparent rate con- 
stants. For example, the kinetic progress of the folding of 
cytochrome č from Rhodopseudomonas palustria, 
although apparently proceeding through four steps 
when the absorbance of the heme is followed at 440 nm, 
can be fit more exactly by a set of 80 rate constants span- 
ning a range from 10° to 10° s'.“* This observation by 
itself is not surprising. Were the kinetic measurements of 
a simple chemical reaction that definitely had only four 
steps to be fit with an equation derived for 80 steps, the 
fit for 80 steps would necessarily be better than the fit for 
four steps. Nevertheless, this exercise suggests that there 
are more than just four steps involved in the folding of 
this cytochrome c’. Careful examination of Figure 13-13 
also suggests that a mechanism involving more than two 
steps in the range monitored would yield a rate equation 
more successful at fitting the actual data. 

It has been shown that the apparently first-order 
rates of the transformations between apparently discrete 
intermediate states in the folding of a protein and the 
behavior of those rates as a function of the concentration 
of denaturant are more successfully fit by kinetic models 
involving a large number of intermediate states in 
sequence.” It has already been noted that the number 
of transitions observed during the folding of a polypep- 
tide usually increases in number as more and more phys- 
ical properties are monitored during the same process, 
so observations based on only one physical property 
usually fail to register all of the discrete steps in the 
process of folding. It has also been proposed that the dif- 
ferences observed among the effects of denaturant on 
the rates of exchange of different amido protons along 
the backbone of a polypeptide (Figures 13-8 and 13-14) 
are evidence for the existence of a continuum of inter- 
mediate states in the process of folding.*“° These various 
intermediate states are in preequilibrium with each 
other prior to a rate-limiting step, represent a set of rate- 
determining steps, are in a steady state with each other 
so that the observed first-order rate constant is actually a 
composite rate constant,” or are related to each other in 
some combination of these three possibilities. Because it 
is so unlikely that any one of these apparently first-order 
transformations is actually a simple single step, a discus- 
sion of the transition state associated with the observed 
first-order rate constant for any one of these transforma- 
tions is meaningless. 

Another complication is the existence of multiple, 
parallel pathways in the folding of a protein. The most 
obvious example of such a situation results from the iso- 
merization of prolines. Each geometric isomer of the 
random coil at each proline that is required to have a par- 
ticular configuration for proper folding folds along its 
own distinct pathway (Figure 13-15). In the case of 
lysozyme from G. gallus*’*” and dihydrofolate reduc- 
tase from E coli,” > however, there are two and four 
parallel pathways, respectively, through which these 
foldings progress, none of which are distinguished by 


differences in the isomerization of prolines. At one or 
more points in the process of folding, some of the mole- 
cules assume one state and the others assume another 
state, and from there on, each of the two populations 
proceeds through a different sequence of steps to form 
the same final native state of the protein. In the case of 
dihydrofolate reductase, the existence of parallel path- 
ways may be related to the fact that there are two differ- 
ent conformations of the native state in equilibrium with 
each other that are both substantially occupied under 
most circumstances. 33 

There are also kinetic dead-ends that can compli- 
cate the process of folding. For example, during the fold- 
ing of equine cytochrome c from its random coil, the 
wrong side chains can form the fifth and sixth ligands to 
the covalently bound heme,*™ and they must dissociate 
and the proper side chains must associate before folding 
can proceed successfully.’ 

One of the most consequential kinetic dead-ends is 
aggregation of the folding polypeptides. When, in the 
laboratory, the concentration of urea or guanidinium 
chloride is rapidly lowered by dilution, the hydrophobic 
side chains on a random coil are no longer favorably sol- 
vated. They can either promote the desired intramolecu- 
lar hydrophobic collapse, or they can associate with 
hydrophobic side chains on other unsolvated polypep- 
tides to form an undesired intermolecular aggregate, 
which just as successfully removes them from contact 
with the water. When the intermolecular aggregation is 
reversible so that the aggregate is in equilibrium with an 
unaggregated, denatured intermediate produced during 
some step in the process of proper folding, the aggrega- 
tion has the effect of lowering the concentration of that 
intermediate on the pathway to the native state and 
decreasing the rate of folding. In this situation, the 
amount of intermolecular aggregate decreases as the 
total concentration of protein is decreased, and the rate 
of folding increases*” as the concentration of protein is 
decreased. The aggregate at the dead end can also appear 
to be an intermediate in the folding process, forming rap- 
idly and then disappearing as the folding depletes the 
solution of the intermediate in equilibrium with the 
aggregate.” 

When the intermolecular aggregation of the folding 
polypeptide is irreversible, however, it competes with 
proper folding, and the final yield of the native state, 
rather than its rate of formation, is decreased accord- 
ingly. Often such irreversible aggregation can be mini- 
mized by lowering the total concentration of 
protein,” but in many instances the folding protein 
passes through an intermediate so prone to aggregation 
that very little if any native state forms at any concentra- 
tion of protein. This catastrophic problem is solved 
within the cell by the chaperones. 

A chaperone is a protein that intercepts and sup- 
presses the unproductive, nonspecific, irreversible inter- 
molecular aggregation of a folding polypeptide so that it 
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can fold intramolecularly to achieve its native state"! 


There are two species of chaperones that are responsible 
for most of this suppression of aggregation and protec- 
tion of proper folding. These two species are represented 
by chaperonin 60 (the product of the groEL gene) and 
heat shock protein 70 (the product of the dnaK gene), 
respectively, from E. coli. Chaperonins 60 have been puri- 
fied from fungi, ® chloroplasts, mitochondria,“ and 
eukaryotic cytoplasm,*” and heat shock proteins 70 are 
also universally distributed. The most obvious distinction 
between these two different species of chaperones, other 
than their unrelated amino acid sequences, is that chap- 
eronins 60 are all large oligomers of 14-16 subunits 
arranged with the respective symmetries of the point 
group 722(D-) or the point group 822(D,), °° while heat 
shock proteins 70 are monomers or dimers. 

The chaperone about which the most is known is 
chaperonin 60 from E. coli. It is a homotetradecamer 
with symmetry of the point group 722(D;) enclosing a 
large central cavity (Figure 13-16)°°%*®°”! that is divided 
into two halves (upper and lower heptamers in Figure 
13-16) at its middle by a thick septum formed from the 
irregular coalescence of the 14 carboxy-terminal seg- 
ments, each 23 amino acids long, which are disordered in 
the crystallographic molecular model.’ Each of the 14 
subunits in turn is divided into three domains (arranged 
one on top of the other and consequently difficult to dis- 
tinguish in the view presented in Figure 13-16). The 
apical domains are symmetrically arrayed around 
the central 7-fold rotational axis of symmetry to form the 
entrances to the upper and lower cavities, the equatorial 
domains surround the central septum, and each of the 
intermediate domains connects its apical domain to its 
respective equatorial domain. 

Chaperonin 60 prevents proteins that aggregate 
irreversibly during their folding in its absence from doing 
so in its presence.*”*?”* It accomplishes this task by rec- 
ognizing and associating with intermediates in the 
pathway of folding that are prone to aggregation.*” By 
itself, chaperonin 60 forms a tight complex with such an 
intermediate.*”°°” If it were the only capability of chap- 
eronin 60, the formation of this tight complex would 
interrupt the folding of a protein and prevent its comple- 
tion. Consequently, the bound intermediate must disso- 
ciate so that folding can proceed. If, however, the bound 
intermediate were to dissociate from chaperonin 60 
unchanged, it would then proceed to aggregate as it was 
about to do just before it was bound. Consequently, the 
structure of the bound intermediate must be altered so 
that it dissociates in a form that is not prone to aggrega- 
tion. The standard free energy necessary to promote the 
dissociation of the bound intermediate and to change its 
structure before it dissociates is provided by the binding 
and hydrolysis of MgATP. 

The structure of the form of the folding protein rec- 
ognized and bound by chaperonin60 has not been 
clearly defined. In theory, chaperonin 60 should recog- 
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nize only intermediates in the process of folding that are 
prone to aggregation and ignore intermediates that are 
not, but when it is added in the absence of MgATP to 
solutions of some proteins that are in the process of fold- 
ing, their folding slows dramatically*’*’” or ceases,® as 
if almost all or all of the states of these proteins other 
than the native state are recognized and bound. With 
other proteins, the rate of their folding is unaffected or 
slightly accelerated upon the addition of chapero- 
nin 60,**' but these are usually proteins that are not sus- 
ceptible to aggregation during their folding. 
Chaperonin 60 also forms complexes when added to 
solutions of certain native proteins,’ presumably by 
recognizing denatured forms in equilibrium with the 
native state and shifting that equilibrium by sequestering 
those denatured forms. Site-directed mutants in which 
hydrophobic amino acids have been replaced with ala- 
nine or glycine are recognized and bound by chapero- 
nin 60 less successfully during their folding,” suggesting 
that it is exposed hydrophobic side chains that elicit the 
attention of the chaperone. 

After it has been bound by chaperonin 60, a folding 
protein is usually in a molten globular state in the com- 
plex.*® This conclusion follows from the fact that the 
rates of the exchange of its amido protons are 
rapid.” 8288386387 With some proteins, unlike what is 
observed in the usual molten globular state, the exposure 
of all amido protons is increased to the same extent, "8 
aresult suggesting that the association of the folding pro- 
tein with the chaperone has disrupted all secondary 
structure. With other proteins exposure of amido pro- 
tons varies along the sequence,” a result suggesting that 
some secondary structure is preferentially preserved in 
the bound form of the protein. With yet other proteins 
almost no difference in exchange rates between the 
bound form of the protein and the native state is 
observed,” a result suggesting that the bound protein is 
similar in its structure to the native state. It has also been 
observed that a tridecapeptide that is structureless in 
solution is an a helix when bound to chaperonin 60.°°° 

Because it is not known which conformation or set 
of conformations of a folding protein is recognized and 
bound by a chaperonin 60, it is possible that the struc- 
ture of that folding protein once it has been bound is 
identical to the structure that it had when it was recog- 
nized and bound, but it is also possible that the structure 
of the protein has been altered significantly during its 
binding to the chaperone to produce the conformation 
or set of conformations that are observed in the complex. 
This alteration in structure caused by the binding itself 
would be at least a portion of the alteration required so 
that a form of the protein that is not prone to aggregation 
can dissociate from the chaperone. 

The folding protein is bound by one or more of the 
apical domains that surround the entrances to each of 
the two central cavities in chaperonin60 (Figure 
13-16).°”°°"' In some cases, the bound protein protrudes 


ydes 
as Tell 


S 


d 


-ƏpL13} IY? JO ÁA NƏIWUTÁS Jo SIXe JEUONLIOI POJ-2 IY} YM Upu Aou 
-UIÁS JO SIXe [EUONEJOL PJOJ-Z SIT YIM 09 uruordeyo əy} Jo do} ay) uodn 
Us ued yey} deo e st ‘(6 = u) syrungns eopuəpr uəsəs zo Iəuwezdəy [eo 
-LQewULAs e ‘QT uruoIradeyo ue OL uluoradeyp pue 09 uruoIsdey> usamJaq 
xəjdwoə ay} Jo [opouı epn9sjour Irydeisogfe}sk19 BY} WON UMPIP ST I] 
‘09 uruoIodey> Jo yey} MOTeq UMEIP ST OT UTUOIOdeU> Jo U0I9J94S UOGIeI-0 
aYL ‘IOYJO Yora 0} [eoNUap! are SONIAeI Jamo] pue Jaddn ayy Jo saınyan.nys 
ay} os SULMPIP au} JO Ja}U99 ƏY) YsnoIY} UNI pue opd əy} Jo əuejd əy) 
ur are (tA) zzz dnois Jurod ay} Jo ANaWIUIAs Jo saxe JEUONEIOLI PJOF-Z UdAVS 
-oyenba JY} IY} ULOJ 0} ,, .2SaTeOd 0} Jy3noyp are pue mag peua ay} 
JO 1a}U99 ay} OUT Zurpnnno1d Japour 1emoəjow sTydexSoT[e}SAIO əy} JO upu 
-I9} AXOqIed FT au) WY aded əy} Jo auejd ay} Ul pus}xa AayL 'Topouı əy} 
woy juasqe ‘Ajuanbasuoo ‘pue ‘pasfoseiun ‘palapisosiIp oie ungns yore 
jo spoe outure Zz [eUTUIIa}-Axoqie9 ay} ‘Japouwl IeNoajour Irydeszorgfersä1d 
ay) UJ ‘souTy AEI YIM UMeIp are Jaure}day I9MOL IY) UI ƏSOYI, 'SYIPIM 109 
-Ia}jIp JO sIUSULZaS DUT] YIM UMeIp are Isure)day Jaddn əy} ur (75 = u) 
spunqns [eonuap! UAS ayy, “AN9ULULÄS JO SKE TEUOTIEIOLI PJOJ-/ ay} UMOp 
POMAIA SI HUMIEPENEI AL goed LVABW JO SeTMoajoul FI YIM xopdurog 
e ot Ula}OId əy} Jo Japow remaajouı Irydeszogfersk1s BY} WOJ UMPIP ST 
(aınyanns ıaddn) 09 uTUOIadeyp Jo IauredapeI}9} IYI JO UOJJONJS UOqIeI-70 

‘109 "J Woy ot uUUOIadeyo pue ọ09uruordeyp :9I-EI 3andıq 


a Saale Jamo] pue soddn ot AALI (mo? IYI SOPIATP WEYI p; 
-OIDU U0NI9J9 WOI SUOHINNSU099I ƏZew Ul paarlasqo wm} 


out 


outward from the cavity;” in other cases, it protrudes 


into the cavity and fills it*’° Whether the bound pro- 
tein ends up within the cavity or protruding away from 
it is in part a function of its size; the larger proteins 
protrude outward because they do not fit within. A fold- 
ing polypeptide larger than 600 aa is too large to fit in the 
cavity.*” 

When the apical domains of chaperonin 60 are 
detached genetically and expressed separately, they 
may’ retain the ability to recognize and bind forms of a 
folding protein that are prone to aggregation.*” A site in 
the crystallographic molecular model of a detached 
apical domain to which an extended hydrophobic seg- 
ment of polypeptide has bound has been tentatively 
assigned as the site for binding of the folding protein to 
the apical domain.’ 

The next step in the rescue of a folding protein from 
aggregation performed by chaperonin 60 is its dissocia- 
tion from this bound state. This dissociation requires 
the addition of MgATP. When MgATP is added to a com- 
plex of chaperonin 60 and denatured protein, it pro- 
motes the dissociation of that protein, permitting its 
folding to recommence.”” The dissociation of the 
denatured protein from the complex is more rapid than 
the hydrolysis of the MgATP*®** and also occurs when 
analogues of ATP that cannot be hydrolyzed are used 
instead of ATP,?7%*0203,406 So it is the binding of MgATP 
and not its hydrolysis that promotes the dissociation. 
The binding of MgATP to chaperonin 60 occurs at a site 
located in each equatorial domain,‘ while the folding 
protein is bound to the apical domain. These two sites 
are 3.2 nm apart at their closest approach, so no direct 
interaction between them is possible. Consequently, 
there must be two global conformations of chaper- 
onin 60 in equilibrium with each other, one of which 
binds denatured protein at the apical domain strongly 
and MgATP at the equatorial zone weakly and the other 
of which binds denatured protein weakly and MgATP 
strongly, and the binding of MgATP must shift the equi- 
librium between these two conformations in favor of the 
one that binds denatured protein weakly.?*03,104,407 

In the conformation of chaperonin 60 that is stabi- 
lized by the binding of MgATP, an ow helix in the apical 
domain that forms part of the site at which the folding 
protein is thought to bind?’ is rotated by 102° relative to 
its orientation in the conformation of the protein that is 
the more stable in the absence of MgATP.*” This rotation 
sequesters a set of hydrophobic side chains and makes 
the site considerably less hydrophobic, perhaps promot- 
ing the dissociation of the folding protein. 

The subsequent slow hydrolysis of the MgATP 
(0.06 s at 25 °C),* at all seven of the sites in one half of 
the protein,” serves to regenerate the conformation of 
the chaperone that can again recognize intermediates 
prone to aggregation. During the folding of a protein that 
requires significant assistance to avoid aggregation, sev- 
eral cycles of binding to chaperonin 60 and release are 
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needed before the native state is achieved.**** Each 
cycle of binding and release requires that the MgATP be 
hydrolyzed to regenerate the competent conformation of 
the chaperone, so the presence of a folding protein in 
need of assistance elicits an ATPase activity from 
chaperonin op H 

The dissociation of the folding protein from 
chaperonin 60 that is promoted by the binding of MgATP 
is accelerated by the addition of chaperonin 10.“ 
Chaperonin 10 is a homoheptamer of subunits 97 aa in 
length that can sit as a cap upon the seven apical 
domains of chaperonin 60 and seal off the respective 
cavity from the solution (Figure 13-16).%™34 Its addi- 
tion is required for the maximum yield of the native state 
of some folding proteins prone to aggregation; with other 
proteins that are prone to aggregation, its addition 
increases the rate of their folding. 

When the folding protein is too large to fit in the 
cavity, chaperonin 10 nevertheless still increases the rate 
of folding or the yield of properly folded protein or both 
of these outcomes but does so by binding to the opposite 
end of chaperonin 60 from the end at which the folding 
protein was bound.°” When the folding protein is small 
enough, chaperonin 10 binds at the end to which it is 
associated and traps it within the respective cavity of 
chaperonin 60,°%*1° where it is definitely protected from 
intramolecular aggregation because it is alone. In the 
presence of chaperonin 10, a trapped folding protein, 
although it has been dissociated from its binding site on 
chaperonin 60 by the binding of MgATP, remains associ- 
ated with the complex of chaperonin 60 and chaper- 
onin 10. Only after the MgATP has been hydrolyzed is the 
folding protein released from the complex along with the 
chaperonin 10.°°*°"!54"4 Because the hydrolysis of the 
MgATP is so slow, however, this means that the complex 
remains intact for, on the average, about 15 s, which is a 
long period of time in the folding of a protein and during 
which considerable progress towards the native state can 
be achieved. 

The requirement that chaperonin 60 alter the struc- 
ture of the folding protein and release it in a form not 
prone to aggregation seems to be accomplished both 
before and after MgATP is bound. The alterations in the 
structure of the folding protein before MgATP is bound 
have already been described, but it has been noted that 
the addition of MgATP and chaperonin 10 to the complex 
between a folding protein and chaperonin 60 further 
increases the rates of amido proton exchange of the fold- 
ing protein,” a result suggesting that significant struc- 
tural alterations are made to the folding protein after the 
binding of MgATP but before it dissociates from its bind- 
ing site on chaperonin 60. 

The chaperones of the species of heat shock pro- 
teins 70, for which heat shock protein 70 from E. coli is 
the paradigm, also rescue folding proteins from aggrega- 
tion by binding them and releasing them in associations 
and dissociations that are coupled to the binding and 
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hydrolysis of MgATP,*'® but the details of the process are 
much less clear. It is the complex between heat shock 
protein 70 and MgATP that binds the folding protein in a 
rapidly reversible equilibrium, and upon hydrolysis of 
the MgATP, the folding protein is locked onto the chap- 
erone TI"? The binding of the next molecule of MgATP 
rapidly releases the bound protein into the solution, pre- 
sumably in an altered conformation. In a crystallo- 
graphic molecular model of the complex between heat 
shock protein 70 and a peptide thought to mimic the 
bound folding protein, the peptide is bound in an 
unstructured extended conformation.“ Heat shock pro- 
tein 40 increases the rate at which the ATPase recycles 
heat shock protein 70 among its various conforma- 
tions,” as does the chaperonin 10 with chaperonin 60. 
There is, however, no cavity in any of these proteins sim- 
ilar to that in the tetradecamer of chaperonin 60. 

As a polypeptide folds to its native state in the cyto- 
plasm of a cell, the high concentration of reduced glu- 
tathione (3-5), or some other mercaptan with the same 
function,” prevents its cysteines from forming adventi- 
tious cystines, and for the same reason, the cysteines of 
the native protein remain reduced throughout its life- 
time. Most of the proteins, however, that are excreted 
from a cell into the extracellular spaces contain cystines. 
In a eukaryotic cell, these cystines are formed as these 
soon to be excreted proteins fold within the lumen of the 
endoplasmic reticulum. In the lumen of the endoplasmic 
reticulum, the ratio of oxidized glutathione to reduced 
glutathione is much higher (0.5)* than it is in the cyto- 
plasm (0.02). The native conformation of the protein jux- 
taposes the two cysteines that will form a correctly paired 
cystine.“ This juxtaposition increases the equilibrium 
constant for the formation of that cystine dramatically, 
so the ambient ratio of oxidized to reduced glutathione 
in the endoplasmic reticulum is sufficient to form a cys- 
tine from the two cysteines once they have been juxta- 
posed:*73* 
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where C, and Ce are the two cysteines juxtaposed, GSSG 
is oxidized glutathione, and GSH is reduced glutathione. 
The fact that one mole of oxidized glutathione yields two 
moles of reduced glutathione pulls the reaction to the 
right. 

The problem, however, is that the ratio of oxidized 
to reduced glutathione in the endoplasmic reticulum is 
also high enough to produce adventitious cystines within 
intermediates in the pathway of folding that happen to 
juxtapose incorrect cysteines. This problem, which 
would lead to the accumulation of stable, improperly 
folded forms of the protein, is solved by the enzyme pro- 
tein disulfide-isomerase, ”° which is present at high con- 


centration in the lumen of the endoplasmic reticulum.” 
This enzyme fulfills two roles in the formation of correct 
cystines during the folding of a protein. It breaks cystines 
to reverse their incorrect formation, and it oxidatively 
couples pairs of adjacent cysteines to form cystines.*“’ 
Consequently, it catalyzes the rapid rearrangement of 
cystines in a folding protein until the correct partners are 
joined to produce the native state of the protein. In fact, 
there are some proteins, for example mammalian pan- 
creatic human insulin-like growth factor“ and bovine 
ribonuclease AP that cannot fold stably unless all of 
their cystines have formed correctly, so their folding is 
required to proceed in tandem with the rearrangement 
of their cystines by protein disulfide-isomerase until the 
combination of the correct tertiary structure and the cor- 
rect pairing of cysteines as cystines is reached. It is both 
the packing of the tertiary structure and the properly 
paired cystines that create their native states. 

Protein disulfide-isomerase (490 aa) contains two 
domains (amino acids 5-100 and 350-440) that are homol- 
ogous to each other and to thioredoxin (110 aa) 7 As is 
the case with thioredoxin, each domain contains a pair of 
cysteines in the sequence -VEFYAPWCGHCK-. It is these 
pairs of cysteines found in protein disulfide-isomerase 
that are responsible for the catalysis of the rearrangement 
and formation of cystines in a folding protein. Two shorter 
proteins, thiol:disulfide interchange protein DsbA 
(218 aa) and thiol:disulfide interchange protein DsbC 
(216 aa), each with only one pair of cysteines in the 
sequences -LEFFSFFCPHCY- and -TVFTDITCGYCH-, 
respectively, fulfill the roles of protein disulfide-isomerase 
for proteins excreted from bacteria. The former can form 
cystines, but the latter is the enzyme responsible for the 
shuffling of incorrectly formed cystines.“" 

The two domains in each molecule of protein disul- 
fide-isomerase, which each contain an identical pair of 
cysteines, are similar in their catalytic abilities and act 
independently.’ The second cysteine in each of these 
pairs of cysteines is responsible for breaking and rear- 
ranging cystines by disulfide interchange during the fold- 
ing of a protein: 
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where Cg is a cysteine elsewhere in the folding protein 
that contains cysteines C, and Cg. The central intermedi- 
ate in this shuffling is the mixed disulfide between pro- 
tein disulfide-isomerase and the folding protein (upper 
right complex in Equation 13-32). The cystine that forms 
between a pair of cysteines in protein disulfide-iso- 


merase by disulfide interchange with oxidized glu- 
tathione 
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is able to convert a pair of juxtaposed cysteines in a fold- 
ing protein into a cystine (reverse of the reactions on the 
lower right and top of Equation 13-32).*” Protein disul- 
fide-isomerase also catalyzes the formation of a mixed 
disulfide between glutathione and a cysteine on a folding 
protein,*** which is also a central intermediate“ in the 
formation of cystines by oxidized glutathione (Equation 
13-31)."” 

Because protein disulfide-isomerase contains one 
pair of cysteines in each of its two domains, and because 
most folding proteins have enough cysteines to give rise 
to two or more cystines while they are folding, at the 
proper molar ratios protein disulfide-isomerase and a 
folding protein form a precipitate,'” just as an antigen 
and a set of polyclonal immunoglobulins form a precipi- 
tate at equivalence. The existence of this precipitate 
serves to demonstrate the importance of the mixed disul- 
fide in Equation 13-32 in the reactions catalyzed by pro- 
tein disulfide-isomerase. 

In order to participate in any of these reactions, in 
particular the formation of the mixed disulfide between 
it and the folding protein, protein disulfide-isomerase 
must be able to find the cystine or the cysteine on the 
folding protein. This necessity requires both that the fold- 
ing protein be molten enough to expose unpaired cys- 
teines or mispaired cystines on its surface and that the 
reaction between protein disulfide-isomerase and those 
exposed cysteines and cystines be rapid and efficient.®™*?’ 
In a role that may be connected to this search for incor- 
rect cystines, protein disulfide-isomerase also acts as a 
chaperone.“ Once the proper cystines are formed, how- 
ever, they can be buried without penalty. It is probably 
the burial of the correctly paired cystines within the 
proper native structure that terminates the rearrange- 
ments of the cystines catalyzed by protein disulfide-iso- 
merase. The same problem of accessibility is faced by 
peptidylprolyl isomerase, and it is interesting that protein 
disulfide-isomerase and peptidylprolyl isomerase func- 
tion synergistically to catalyze the folding of a protein.** 

The histidine between the two cysteines in each 
domain of protein disulfide-isomerase, as opposed to a 
proline at the homologous position in thioredoxin, 
causes the cystine in the former protein to have a reduc- 
tion potential 30-40 mV more positive than that in the 
latter.®®* This higher reduction potential causes a cys- 
tine in protein disulfide-isomerase to be more effective at 
forming cystines in folding proteins by disulfide inter- 
change than a cystine in thioredoxin would be. The 
homologous cystine in thiol:disulfide interchange pro- 
tein DsbA from E. coli is also exceptionally reactive.“ 
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In order to perform both the rearrangement of 
cystines and their formation most efficiently during its 
catalysis of the folding of a protein (Equation 13-32), pro- 
tein disulfide-isomerase must be poised at a level of 
reduction potential where only a portion of its cysteines 
are cystines, and in the laboratory, the optimal potential 
for the catalysis of the folding of a protein is reached at a 
ratio of oxidized glutathione to reduced glutathione of 
0.2-0.5.°°” This ratio is not significantly different from the 
ratio found in the endoplasmic reticulum. 

The proteins the foldings of which have been dis- 
cussed so far have been fairly small and in each case the 
entire protein has folded as a unit to produce the native 
state. The folding of a larger protein is usually compli- 
cated not only by the larger number of prolines it con- 
tains but also by the fact that larger proteins usually 
contain domains, which often fold independently of each 
other. For example, lysozyme, itself not a large protein, 
nevertheless contains two structural domains. On the 
basis of measurements of amido proton exchange, it was 
demonstrated that one of these domains folds more rap- 
idly than the other,“ and the completion of the foldings 
of the two domains is followed by a step in which they 
become properly oriented and associate correctly with 
each other to form the native state.*”’ The rates of the final 
slow steps in the foldings of both aspartate 
kinase-homoserine dehydrogenase from E coli” and 
D-octopine dehydrogenase from Pecten jacobaeus’” are 
inversely proportional to the viscosity of the solvent and 
are thought to represent the association of independently 
folding domains. The transfer of energy by resonance 
between donors and acceptors positioned at several loca- 
tions on phosphoglycerate kinase from S. cerevisiae has 
been used to follow the changes in the distances between 
these locations during the unfolding of the protein.” To 
the extent that unfolding is the reverse of folding, the fact 
that the first step in the unfolding of the protein is the dis- 
sociation of its two domains suggests that the last step in 
its folding is the association of these two domains. 

The results of these experiments suggest that, in 
larger proteins, the individual domains fold independ- 
ently as if they were the small unitary proteins that have 
been discussed in detail until there is sufficient structure 
developed for them to recognize each other and associ- 
ate. Following their association, the information devel- 
oped during this association may or may not dictate 
further folding to reach the final native state depending 
on the intimacy and interdependence of their interaction. 
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Problem 13-5: The figure displays the behavior of the 
observed rate constants, kob, in units of seconds? for 
folding and unfolding of human lysozyme. The rate con- 
stants for folding were obtained by rapidly diluting the 
unfolded polypeptide from a solution of 4.5 M guani- 
dinium chloride to the noted final concentration; and 
those for refolding, by rapidly mixing the native protein 
with a solution of guanidinium chloride to produce the 
noted final concentration. 
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(A) The rate at which the unfolded protein refolds is 
governed by Equation 13-10. Derive a similar 
equation describing the change in the concentra- 
tion of the folded form ([F]) as a function of time. 


(B) Why is the curve for the observed rate constant of 
folding continuous with the curve for the 
observed rate constant of unfolding? 


(C) Which intrinsic rate constant dominates the 
observed rate constant in the unfolding region? 


(D) Which dominates in the folding region? 


(Œ) Why does the behavior of kp as a function of the 
concentration of guanidinium chloride indicate 
that there is at least one kinetic intermediate in 
the folding reaction? 


(F) Estimate the rate constant for the formation of 
this intermediate in the absence of guanidinium 
chloride. 


(G) Estimate the rate constant for the formation of the 
native state from the intermediate state or the 


intermediate states in the absence of guani- 
dinium chloride. 


(H) Is the isomerization of peptide bonds amino-ter- 
minal to prolines involved in the folding of human 
lysozyme? How did you decide? 


(D How does the value of ky change as the concen- 
tration of guanidinium is increased up to 5 M? 


J) In the transition region in which the equilibrium 
constant for folding can be measured, what are 
the relative values of the rate constants kp and ky? 


Problem 13-6: Using the notation of Equations 13-31 
through 13-33, write a set of equations that describes the 
possible ways that protein disulfide-isomerase could cat- 
alyze the formation of the mixed disulfide between glu- 
tathione and a cysteine on a folding protein. 
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When oligomeric proteins are dissolved in solutions of 
guanidinium chloride, they dissociate into their consti- 
tutive polypeptides that unfold to random coils. When 
the guanidinium chloride is removed from the solution, 
the random coils refold, and the refolding monomers 
reassociate to form the native state. For example, the 
inorganic diphosphatase from E coli is an (a). hexamer 
in its native state. When dissolved in 5 M guanidinium 
chloride at pH 7, it dissociates into single œ polypeptides 
(Naa = 175), as judged by sedimentation equilibrium, that 
are random coils, as judged by their sedimentation coef- 
ficient D Age 0.59 S) and intrinsic viscosity (ul = 22 cm? 
g™. When the guanidinium chloride is removed by dial- 
ysis, 80-90% of the enzymatic activity slowly returns, and 
the protein that results is an (œ), hexamer indistinguish- 
able in sedimentation coefficient, optical rotatory dis- 
persion, or ultraviolet absorption spectrum from the 
original native enzyme.“ The native protein contains no 
cystines, but it does contain cysteines. Renaturation is 
successful only if an external mercaptan, which mimics 
the reduced glutathione normally present in the cyto- 
plasm, is added to prevent adventitious intermolecular 
and intramolecular formation of cystine, a reaction that 
interferes with proper refolding. 

The steps in the assembly of an oligomeric protein 
can be followed by quantitative cross-linking. Phospho- 
glycerate mutase from S. cerevisiae is an (œ), tetramer. 
When it is dissolved in 4 M guanidinium chloride, it dis- 
sociates into random coils of the o polypeptide as judged 
by circular dichroism (Figure 13-17A).""° When the solu- 
tion is diluted 40-fold to 0.1 M guanidinium chloride, 
greater than 80% of the molar ellipticity of the native state 
is regained in less than 30 s (Figure 13-17B). At this point 
greater than 80% of the protein is still monomeric. The 
appearance of dimers and tetramers as a function of time 
could be followed by quantitative cross-linking to cata- 


L x 100 
x Ea 
E D 

° 2 

= = 50 
£ S 

Ki =] 

D & 

oO O3 

2 -8 = g 
© 200 220 240 0 5 10 90 


Wavelength (nm) Time (min) 


Figure 13-17: Assembly of yeast phosphoglycerate mutase follow- 
ing dilution from 4 to 0.1 M guanidinium chloride.” (A) Far-ultra- 
violet circular dichroic spectra of native enzyme (@), native enzyme 
in 0.1 M guanidinium chloride (A), and enzyme in AM guani- 
dinium chloride (©); all measurements were made at a concentra- 
tion of protein of 1.7 mg mL". Molar ellipticity (9) in units of 
degree centimeter’ (decimole of peptide bonds)" is presented as a 
function of wavelength (nanometers). (B) Regain of molar elliptic- 
ity at 225 nm upon dilution from 4 to 0.1 M guanidinium chloride. 
Ellipticity is presented as a function of time (minutes) in relative 
units where 0% is the molar ellipticity of fully unfolded protein and 
100% is the molar ellipticity of the native protein (see panel A). 
(C-E) Assembly of the oligomer. At the noted times after initiation 
of folding by dilution from 4 to 0.1 M guanidinium chloride, sam- 
ples were removed and cross-linked quantitatively with 1% glu- 
taraldehyde for 2 min, and the complexes between the resulting 
covalent oligomers and dodecyl sulfate were submitted to elec- 
trophoresis on gels of polyacrylamide in the presence of dodecyl 
sulfate. The amounts of monomer (el, dimer (A), and tetramer (0) 
were assessed by scanning the stained gels for absorbance. The rel- 
ative amounts of monomer, dimer, and tetramer (as a percentage 
of the sum of the three amounts) are plotted as a function of the 
time (hours) between dilution of the guanidinium chloride and the 
addition of the glutaraldehyde. The final concentrations of protein 
were (C) 11, (D) 21, and (E) 37 ug mL. The solid curves were drawn 
in all three panels with integrated rate equations based on 
Equation 13-34 with k, = 6.25 x 10° Miel, k,=6.0x 10° s", and 
ky = 2.75 x 10? M” s. The temperature for all of these experiments 
was 20 °C, and they were run at pH 7.5. Reprinted with permission 
from ref 445. Copyright 1983 Journal of Biological Chemistry. 


logue the species present at each point (Figure 
13-17C-E).“° No trimers were observed, as would be 
expected. From an examination of the progress of the 
reaction, it could be concluded that the dimer was the ini- 
tial oligomer, which, as it built up in concentration, dimer- 
ized to produce tetramer. Although the circular dichroism 
of the sample changed insignificantly as the reaction pro- 
gressed (Figure 13-17B), the intrinsic fluorescence of the 
protein increased in concert with the oligomerization. 
Both the rate of the oligomerization (Figure 13-17C-E) 
and the rate of the increase in fluorescence (except for a 
small immediate increase of 20% that was invariant) were 
dependent on the absolute concentration of the protein. 

The kinetics of both the oligomerization of phos- 
phoglycerate mutase and the increase in fluorescence 
could be accounted for quantitatively“ by the mecha- 
nism 
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wherein all unfolded polypeptides, ay, have folded in the 
first 30 s; the folded monomer, or, regains 20% of the flu- 
orescence of the native state; and both the dimer and the 
tetramer have the full fluorescence of the native state. 
That the folded monomer and reassembled dimer can be 
digested with trypsin while the reassembled tetramer 
cannot’ suggests that the polypeptides in the monomer 
and the dimer are loosely folded and regain their fully 
compact native state only following tetramerization. 
That both the monomer and the dimer possess some 
enzymatic activity‘ suggests that they are properly 
folded. The tetramer is produced in quantitative yield 
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with full enzymatic activity at these concentrations of 
protein (<50 ug mt TE In this oligomerization, the 
association of two dimers to form the tetramer is the 
rate-limiting step in the reaction (Equation 13-34), but in 
the oligomerization of the tetramerization domain (naa = 
30) of human cellular tumor antigen p53, it is the associ- 
ation of two monomers to form a dimer that is the rate- 
limiting step, and the association of the two dimers to 
form the tetramer is so rapid that it is kinetically silent.“ 

The assembly of a dimer from its dissociated 
random coils is an even simpler reaction. Porcine mito- 
chondrial malate dehydrogenase is an a, dimer that can 
be reversibly unfolded in several different ways. After 
random coils, ay, of the o polypeptide are transferred to 
a solution at neutral pH, coincident with the dilution of 
the denaturant, the reappearance of enzymatic activity 
shows the same time course regardless of the mode of the 
original denaturation (Figure 13-18).““* The time course 
displays two phases, a lag followed by an increase. The 
increase in activity has a second-order dependence on 
the concentration of protein. The lag is unaffected by the 
concentration of protein and is a first-order process. The 
results can be explained quantitatively with the following 
mechanism: 


kp k2 


2ay —> 2a — (ap) (13-35) 


if only the dimer, (or), and not the folded monomer, 
or, is enzymatically active. At pH 7.6 and 20°C, 
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Figure 13-18: Reassembly and reactivation of porcine mitochon- 
drial malate dehydrogenase at a concentration of 60 nM and at 
pH 7.6 and 20 °C after dilution from various denaturants.“* The 
protein was unfolded at pH 2.3 alone (@), in 6 M guanidinium chlo- 
ride (m), or in 6 M urea (A). After it was fully unfolded and dissoci- 
ated into its separate polypeptides in each instance, it was diluted 
to initiate refolding and reassembly. Samples were removed at the 
noted times and assayed for enzymatic activity; the enzymatic 
activity is presented in relative units with 0% being the immediately 
observed enzymatic activity (<4% of the final) and 100% being the 
enzymatic activity after full reactivation (>24 h). The curve is for the 
integrated rate equation for the mechanism of Equation 13-35 with 
k,=6.5x 104s! and k = 3 x 10 M” s”. Reprinted with permission 
from ref 448. Copyright 1979 American Chemical Society. 


kı = 0.0006 s™ and k, = 30,000 M” sl The value of kp is 
too small to be the rate constant for the rapid refolding of 
a polypeptide with the correct proline isomers to form a 
structure with no domains. The rate is probably slow 
because isomerizations of peptide bonds amino-termi- 
nal to prolines are required or because the two structural 
domains of the monomer observed in the crystallo- 
graphic molecular model of the protein“ associate 
slowly or because both of these problems must be over- 
come before the monomer has regained sufficient native 
structure to recognize another monomer and dimerize 
with it. 

In the case of the reassembly of the dimer of aspar- 
tate transaminase from E. coli, there are two consecutive, 
slow, first-order, unimolecular steps which produce a 
molten globular monomer that is enzymatically inactive 
but that has sufficient native structure to dimerize. This 
monomer displays a circular dichroic spectrum in the far 
ultraviolet similar to that of the native protein. Its dimer- 
ization and the formation of the final enzymatically 
active native state are rapid, and because these later 
steps follow the slow rate-limiting formation of the 
molten globular monomer, they are kinetically silent.” 
In the folding and assembly of the dimer of steroid A-iso- 
merase from Pseudomonas testosteroni, however, all 
three steps, the unimolecular formation of an enzymati- 
cally inactive monomeric intermediate (60 s™ at 25 °C), 
its bimolecular association (60,000 M™! s™! at 25 °C), and 
the formation of the final enzymatically active native 
structure (0.017 s% at 25°C) could be resolved kineti- 
cally. These experiments demonstrate that a 
monomeric intermediate does not have to assume its 
fully native state before it oligomerizes and that, in such 
a situation, further isomerizations can then occur 
within each subunit following oligomerization to pro- 
duce the final native state. The final step in the reactiva- 
tion has a rate constant suggesting that the isomerization 
of a peptide bond amino-terminal to a proline is 
involved. 

Transcriptional repressor arc from bacteriophage 
P22 is also an œ dimer but of much smaller subunits 
(naa = 53). At low concentrations, its rate of folding and 
assembly is second-order in the concentration of pro- 
tein, can be fit by a bimolecular rate equation, shows no 
evidence of any intermediates, and is complete in less 
than a second. Because the reaction is second-order, the 
rate-limiting step is dimerization (Equation 13-35) under 
these conditions. As the concentration of protein is 
increased above 20 uM, however, there is a change in the 
rate-limiting step as the rate of dimerization becomes so 
fast that a preceding unimolecular step (Equation 13-35), 
the rate constant of which does not depend on the con- 
centration of protein, becomes rate-limiting.” Because 
the change in fluorescence with time shows no evidence 
for the formation of any intermediate when the dimer- 
ization is the rate-limiting step,’ the unimolecular step 
is probably an unfavorable preequilibrium producing 


an unstable monomeric state that is competent to dimer- 
ize. This unstable monomer is then captured and stabi- 
lized by the favorable dimerization. 

The assembly of a trimer (Figure 9-11) is somewhat 
more complicated than that of either a tetramer or a 
dimer because the addition of the third subunit to the 
dimer is quite different from the initial combination of 
two monomers to form the dimer. The catalytic subunit 
of aspartate carbamoyltransferase (Figure 9-37) is an 
œ; trimer. The assembly of trimers of the catalytic subunit 
from random coils of the a polypeptide is a first-order 
process with no evident intermediates and a rate con- 
stant of 2 x 10“ el at 0 °C.” It seems that again a slow 
isomerization of the partially folded, monomeric 
a polypeptide is the rate-limiting step in the assembly 
from random coils. To circumvent the barrier presented 
by this isomerization to the kinetic observation of inter- 
mediates in the process, native a trimer was dissociated 
into globular rather than unfolded œ monomers with 
thiocyanate ion (S=C=N)),*” which is a milder denatu- 
rant than either urea or guanidinium ion. It is an anion 
that salts in protein as does urea or guanidinium but not 
so vigorously. The enzymatically inactive œ monomers 
that result retain most of the circular dichroic ellipticity 
and ultraviolet absorption of the native a, trimers and 
have a frictional ratio f/f) of 1.27.” These globular 
a@monomers assemble readily to form a, trimers after 
the dilution of the thiocyanate. 

When the assembly was followed by quantitative 
cross-linking, @monomers turned directly into 
œ trimers with no evidence for the formation of any 
œ dimers (<3%). The appearance of o trimers was coin- 
cident with the return of enzymatic activity. Both of these 
processes, however, were strictly second-order in the 
concentration of œ monomer:”” 


(13-36) 


A mechanism consistent with both of these results is 


kı k 


3a — A + Oa —. Oz (13-37) 


where k, >> kı. When the third monomer adds to the 
dimer, two interfaces form simultaneously, and this reac- 
tion could have a much lower standard free energy of 
activation than the formation ofthe dimer itself. Because 
the second step in Equation 13-37 is so much faster than 
the first, no œdimer accumulates. The first step in 
Equation 13-37, however, a bimolecular reaction, is the 
rate-limiting step. 

When homooligomeric proteins are assembled 
from random coils, the observations are consistent with 
the first step in the process being the folding of the 
random coil to a globular structure. In many instances, 
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this structure is loosely folded. For example, it may be 
sensitive to endopeptidolytic cleavage. This globular 
monomer either combines directly with other globular 
monomers to form the oligomer or it undergoes one or 
more isomerizations before it is competent to assemble. 
These isomerizations may be isomerizations of peptide 
bonds amino-terminal to critical prolines or rearrange- 
ments of domains, but these possibilities have not been 
validated. The competent monomers then assemble in 
simple, reasonable bimolecular steps to form the enzy- 
matically active oligomer. When the reactions can be 
observed, the rate constants measured for these bimole- 
cular steps are between 10° and 10°M"™ sl at 
25 °C, “855 several orders of magnitude below diffu- 
sion-controlled rates for the collision of molecules of this 
size. Therefore, they proceed with significant standard 
free energies of activation. 

Whether or not enzymatic activity is displayed by 
the various compact intermediates in this process seems 
to be a property of the individual protein. Both 
œ monomer and œ dimer of phosphoglycerate mutase 
have enzymatic activity.“ Fumarate hydratase, an 
(œ) tetramer, can be denatured to random coils and 
reassembled to an œ dimer. The a dimer is enzymati- 
cally inactive until it assembles to the (a). tetramer.””° 
Porphobilinogen synthase from Pisum sativum, an 
[(O2)2]20ctamer, can be disassembled to a,dimers by 
dilution. Only the octamer and the (@),tetramer are 
enzymatically active.“ The single active site of HIV-1 
retropepsin from human immunodeficiency virus type 1 
is formed from both subunits of the œ dimer, so it is not 
surprising that the o monomer has no enzymatic activ- 
ity.”® When fructose-bisphosphate aldolase, an 
(œ) tetramer, is denatured to random coils that are then 
transferred to a solution at pH 5.5, the random coils fold 
to form o monomers that have the sedimentation coeffi- 
cient of a globular protein of their length and the circular 
dichroic spectra and ultraviolet spectra of the native pro- 
tein. Their enzymatic activity cannot be determined 
because these o monomers oligomerize too rapidly to 
(œ), tetramers when mixed with substrates,’ but they 
must bind those substrates for their assembly to be 
affected by them. 

The assembly of heterooligomers constructed 
from several copies of each of two or more different 
polypeptides is somewhat more complex than that of 
homooligomers. When the assembly of a heterooligomer 
is studied, the reactants employed are the globular, 
homooligomeric subunits, such as catalytic subunits and 
regulatory subunits of aspartate carbamoyltransferase 
(Figure 9-37). It is generally assumed, in the absence of 
any evidence, that under physiological circumstances 
the homooligomeric subunits assemble first and then 
combine to form the heterooligomer. Only those pro- 
teins formed from two or more polypeptides translated 
from different messenger RNAs are of interest. Proteins 
containing different polypeptides arising from the post- 
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translational cleavage of identical larger polypeptides 
fold and assemble as simple homooligomers before the 
posttranslational modification occurs. 

Alkanal monooxygenase (FMN-linked) from Vibrio 
harveyi is an «ßheterodimer in which the a subunit 
(Naa = 355) and the p subunit (n,a = 324) are homologous 
and superposable, but neither of these subunits will form 
a homodimer. When they are expressed separately, each 
is a globular monomer containing about 50-60% of the 
a helix of a monomer in the native heterodimer but no 
discernible tertiary structure as judged from their 
nuclear magnetic resonance spectra. These properties 
are consistent with these two monomers being molten 
globules. Nevertheless, when they are mixed together, 
these o monomers and p monomers heterodimerize and 
form native alkanal monooxygenase.*” Either the two 
molten globular forms dimerize and then assume their 
native states while in the dimer, or the fully native states 
of the a monomer and the 8 monomer are present in the 
separate solutions of each at indetectably low equilib- 
rium concentrations, and it is these forms that dimerize 
while the dimerization pulls the equilibria in the direc- 
tion of the native states. 

Tryptophan synthase from E coli is another simple 
example of the assembly of a heterooligomer.*” This pro- 
tein is an oo heterotetramer.*” When it is dissociated 
into its components, the products are œ monomers and 
p dimers, and both can be obtained in globular, folded 
states. When œ monomer is mixed with excess ß, dimer, 
the major product is the complex oft." It forms in a reac- 
tion the kinetics of which are consistent with the mecha- 
nism 


k k 
— aß, £ Oty B” 
kp b 


æ+ Bo (13-38) 


where the complex aß,* is an isomerized form of the ini- 
tial intermediate oft, The rate constants kp, kp, k, and kp 
for this process at 25 °C are1x10°M's',3s',6s”, and 
0.001 s', respectively. When excess o monomer is then 
added to the aß,* complex, the next step in the assembly 
has kinetic behavior consistent with the mechanism 


a + aß, = Oy Bo = ~b (13-39) 
D k 


and the rate constants kp’, kp’, ký, and ky for this process 
at 25 °C are 1.6 x IM sl, 26s", 16s", and 0.002 s”, 
respectively. 

Each time an interface forms between an 
amonomer and one of the two monomers in the 
ß,dimer of tryptophan synthase, an isomerization of 
the structure of either the participating p monomer or 
the conjoined o monomer, or both, occurs, producing the 
asterisked conformer. The isomerizations producing 


the conformers aß,* or &ß,** are too rapid to be isomeriza- 
tions of prolines. They presumably represent rearrange- 
ments of the structures after the association of the a 
and f subunits and are similar to the changes that permit 
the tetramer of phosphoglycerate mutase to resist 
endopeptidolytic degradation or that permit enzymati- 
cally inactive subunits to regain full enzymatic activity 
after oligomers such as malate dehydrogenase, aspartate 
transaminase, and fumarate hydratase reach the native 
stoichiometry. The equilibrium constant (Kis = k;/k,) for 
the isomerization following the addition of the first 
a monomer (Kis = 6000) is the same as that following the 
addition of the second o monomer (K;, = 8000), indicat- 
ing that the same local adjustments are occurring after 
each o monomer adds in turn. 

Aspartate carbamoyltransferase is constructed 
from two catalytic C subunits that are o3trimers and 
three regulatory R subunits that are ß, dimers. From its 
crystallographic molecular model (Figure 9-37), it is clear 
that only certain steps are possible in its assembly from 
separated C subunits and R subunits (Figure 13-19).*® 
The intermediates that appear during the assembly of 
the intact (@3)2(f.)3heterododecamer (CR; heteropen- 
tamer) have been followed by quenching the assembly of 
radioactive catalytic or regulatory subunits and their 
unradioactive complements with large excesses of unra- 
dioactive catalytic or succinylated catalytic subunits, 
respectively.“ The specific radioactivity of the various 
mosaic oligomers, which, because of the excess negative 
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Figure 13-19: Intermediates in the assembly of aspartate car- 
bamoyltransferase.® Catalytic (C) or regulatory (R) subunits that 
had been made radioactive by iodinating their tyrosines with I 
(Reaction 10-33) were mixed with excess unradioactive R or 
C subunits, respectively, to initiate assembly. At various times, the 
assembly reaction was quenched with either excess C subunit or 
excess succinylated C subunit to cap off partially formed com- 
plexes and scavenge all unreacted R subunit. From examining the 
specific radioactivity of complexes separated by electrophoresis, 
an estimate of the relative concentrations of all of the intermedi- 
ates in the process of assembly at each time point could be made. 
The changes in these relative concentrations with time were used 
to formulate the assembly diagram displayed in the figure. Four of 
the steps in this process are rapidly reversible: C+ R == CR, CR+R 
== CR, CR, + R => CR, and CR + C == CR In contrast, processes 
forming the complexes C,R, and C,R; are essentially irreversible 
because these complexes are so stable. Reprinted with permission 
from ref 463. Copyright 1980 Journal of Biological Chemistry. 


charge (Equation 10-27) on the succinylated C subunits, 
can be separated from each other by electrophoresis, 
permits the concentrations of the various intermediates 
at the time the reaction was quenched to be calculated. 

When a limiting concentration of C subunit is 
mixed with various excesses of R subunit, equilibrium 
mixtures of the intermediates CR, CR», and CR, are 
formed. Subsequent addition of excess C subunit causes 
CR; to be trapped as CR;C, the intact native protein, and 
CR, to be trapped as CR,C.** When excess C subunit is 
mixed with a limiting concentration of R subunit, the 
only two products other than unreacted C subunit are 
CR;C and CR,C, with the former in the majority.““"“® The 
complex CR;C can be isolated as a stable protein. When 
itis combined with R subunit, it produces CR;C in a clean 
bimolecular reaction.“ In these experiments, most of 
the intermediates in the general scheme (Figure 13-19) 
have been directly observed, and rate constants and 
equilibrium constants for their interconversion have 
been established.“ Most of the steps in the scheme 
seem to occur simultaneously, and different pathways 
become more or less important as concentrations of the 
subunits are changed. 

The pyruvate dehydrogenase complex of E. coli is 
composed of three different polypeptide chains, a, B, and 
y. The protein can be resolved into these three inde- 
pendent components. These are the dihydrolipoyllysine- 
residue  acetyltransferase core, the pyruvate 
dehydrogenase (acetyl-transferring) subunits, and the 
dihydrolipoyl dehydrogenase subunits. The dihy- 
drolipoyllysine-residue acetyltransferase core is an 
octahedral a, oligomer (Figure 9-23), pyruvate dehydro- 
genase (acetyl-transferring) is a f dimer, and dihy- 
drolipoyl dehydrogenase is a y dimer. No association 
can be detected between the ß, dimers of pyruvate dehy- 
drogenase (acetyl-transferring) and the % dimers of dihy- 
drolipoyl dehydrogenase.*” Therefore, the a, oligomer 
of the dihydrolipoyllysine-residue acetyltransferase 
serves as the point of attachment of the other compo- 
nents. 

Unlike the closely related dihydrolipoyllysine- 
residue succinyltransferase from the 2-oxoglutarate 
dehydrogenase complex, which can associate with only 
six $ dimers of oxoglutarate dehydrogenase (succinyl- 
transferring) at saturation because of a steric effect,’® the 
empty o,oligomer of the dihydrolipoyllysine-residue 
acetyltransferase from E. coli can associate with up to 24 
ß,dimers of pyruvate dehydrogenase (acetyl-transfer- 
ring).“°’“ Presumably, in the saturated complex, one of 
the two faces on each of the 24 ß, dimers occupies one of 
the 24 equivalent faces of the octahedral a, oligomer 
with no steric hindrance. The empty o oligomer of the 
dihydrolipoyllysine-residue acetyltransferase can also 
associate with as many as 20 y dimers of dihydrolipoyl 
dehydrogenase in the absence of pyruvate dehydroge- 
nase (acetyl-transferring).“® 

When both ß,dimers of pyruvate dehydrogenase 
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(acetyl-transferring) and % dimers of dihydrolipoyl dehy- 
drogenase are added together to the dihydrolipoyllysine- 
residue acetyltransferase core, substoichiometric 
amounts of each are bound,*” presumably because of 
steric crowding. Certainly the native protein, which has 
an average of about 12 y dimers of dihydrolipoyl dehy- 
drogenase and an average of somewhat less than 24 
ß,dimers of pyruvate dehydrogenase (acetyl-transfer- 
ring), ® appears to be a crowded structure (Figure 
13-20). When a preformed complex containing an 
average of 12 y dimers of dihydrolipoyl dehydrogenase 
for each a,oligomer of dihydrolipoyllysine-residue 
acetyltransferase is mixed with increasing amounts of 
pyruvate dehydrogenase (acetyl-transferring), about 22 
ß.dimers of pyruvate dehydrogenase (acetyl-transfer- 
ring) bind to the a,oligomers at saturation, and the 
overall enzymatic activity increases in direct proportion 
to the number bound.*”’ All of these results suggest that 
ß.dimers of pyruvate dehydrogenase (acetyl-transfer- 
ring) and % dimers of dihydrolipoyl dehydrogenase add 
at random to the respective faces on the @, oligomer of 
dihydrolipoyllysine-residue acetyltransferase, at least 
under the circumstances of these experiments, until 
there is no more room left around the core. What is not 
clear is whether the dimers of dihydrolipoyl dehydroge- 
nase and pyruvate dehydrogenase (acetyl-transferring) 
add at random to the core during normal assembly 
within the cell until no more can fit or there is some 
ordered sequence that determines the final stoichiome- 
try. 

The 30S subunit of a ribosome from E. coli (Figure 
11-5) is composed of a single strand of 16S ribosomal 
RNA (nuc = 1541)" and 21 polypeptides that, when 


Figure 13-20: Electron micrographs of (A) the pyruvate dehydro- 
genase complex from E. coli and (B) the core of dihydrolipoylly- 
sine-residue acetyltransferase from the same protein.’ Both 
specimens were adsorbed onto a thin, supported layer of amor- 
phous carbon on an electron microscopic grid and negatively 
stained with sodium methylphosphotungstate. Magnification is 
300000x. The complete complex was purified directly from a 
homogenate of the bacteria; the acetyltransferase core was pre- 
pared from the complete complex by stripping away dihydrolipoyl 
dehydrogenase and pyruvate dehydrogenase (acetyl-transferring). 
Reprinted with permission from ref 470. Copyright 1971 Cold 
Spring Harbor Laboratory. 
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folded and assembled, constitute its 21 subunits. When 
the ribosomal RNA and the separated individual 
polypeptides are mixed together, they spontaneously 
reassemble in high yield to form 30S subunits that are 
fully competent to participate in protein biosynthesis.’ 
The assembly of the intact 30S subunit from its compo- 
nents (Figure 13-21)‘ proceeds through an explicit 
sequence of steps beginning with the binding of a few of 
the subunits to the 16S ribosomal RNA itself. As the 
assembly progresses, the binding to the 16S ribosomal 
RNA of the subunits earlier in the sequence of events or 
the binding of polypeptides to complexes between the 
16S ribosomal RNA and other polypeptides creates sites 
to which subunits later in the sequence of events can 
attach (Figure 13-21). If a polypeptide is added to the 
mixture before all of the subunits that must precede it 
have been incorporated, it will not bind to the partially 
assembled 30S subunit. An assembly map, necessarily of 
greater complexity but describing a similar hierarchically 
ordered process, has been drawn for the assembly of the 
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Figure 13-21: Assembly diagram for the 30S subunit of the ribo- 
some from E. coli.’ The sequence of events was determined by 
mixing, in various combinations, the 21 purified polypeptides with 
the purified 16S RNA and assaying for formation of a complex or 
complexes among the components. For example, only polypep- 
tides $15, S17, S4, and S8 would bind alone to 16S ribosomal RNA. 
Polypeptide S20 will form a complex with 16S RNA only when sub- 
units S4 and S8 have been incorporated. Polypeptide S13 binds to 
16S RNA only when subunits S4, S8, and S20 have been incorpo- 
rated, and so forth. Upon binding each polypeptide becomes a sub- 
unit of the assembling 30S ribosomal subunit. Reprinted with 
permission from ref 473. Copyright 1974 Journal of Biological 
Chemistry. 


50S ribosomal subunit of E. coli from 23S ribosomal RNA, 
5S ribosomal RNA, and 31 polypeptides.’ 

From the crystallographic molecular model of the 
30S subunit,’ some inferences can be drawn to explain 
the order in which its subunits are incorporated into the 
assembling particle (Figure 13-21). None of the polypep- 
tides except S4, S8, S17, and S15 can add until other sub- 
units have been incorporated. Subunits S6 and S18 form 
an intimate complex in one corner of the complete 
30S subunit, and subunits S10, S14, and S3 form an inti- 
mate complex in another corner. These close associa- 
tions explain the interdependences between the 
additions of the polypeptides of these subunits during 
assembly. Most of the subunits of the intact 30S subunit, 
however, have little if any contact with each other in the 
final particle. 

The last polypeptides to add to the assembling par- 
ticle, S3, S10, S14, S11, S18, S5, $12, and S9 (Figure 13-21), 
all form contacts with at least one and as many as five 
double helices of RNA also contacted by the subunit or 
subunits that must precede them onto the particle. These 
relationships suggest that the preceding subunit or sub- 
units control the orientations of these double helices so 
that the site for the binding of the following subunit 
among these double helices is either created or stabilized. 

The earlier subunits to add to the assembling parti- 
cle, however, subunits S19, S13, S7, and S20, do not share 
either direct contacts or contacts with double helices in 
common with the subunits that must precede them. This 
fact suggests that global conformational changes of the 
16S ribosomal RNA are effected by the earliest polypep- 
tides to add, namely those of subunits S4, S8, S20, and S7, 
to create the distant sites for the polypeptides of subunits 
S19, S13, S7, and S20. 

Not only does the structure of the 16S ribosomal 
RNA seem to adjust upon the association of the individ- 
ual subunits but also the conformations of the separated 
subunits seem to adjust, sometimes dramatically, upon 
their association. One polypeptide that seems to be 
almost structureless before it associates with the 16S 
ribosomal RNA is polypeptide S4 (naa = 203). The nuclear 
magnetic resonance spectrum of polypeptide S4 under 
the conditions in which assembly takes place is almost 
indistinguishable from its spectrum in 8 M urea, which is 
the spectrum of the sum of the amino acids from which 
it is composed.*” This result indicates that, when alone 
in solution, polypeptide S4 cannot assume a unique 
native state. The circular dichroic spectrum*” and fric- 
tional ratio (f/f) = 1.7), however, are not those of a fully 
random coil (f/f) = 2.4 for naa = 245),'” and an explana- 
tion of these results and those from nuclear magnetic 
resonance spectroscopy would be that the polypeptide in 
solution is rapidly passing through an array of loosely 
folded conformations, none of which is unique. When it 
is bound to the 16S ribosomal RNA, the subunit S4 
assumes a defined structure with seven ahelices and 


four strands of £ structure, but it is flattened and spread 
over the surface of the 30S subunit.” As it associates 
with the 16S ribosomal RNA, its final structure could 
easily be dictated solely by the noncovalent interactions 
in which it participates as it spreads over the surface of 
the folded polynucleotide. 

Some ribosomal polypeptides seem to enter the 
assembling 30S subunit as folded globular proteins. For 
example, polypeptide S17 (naa = 83) under the conditions 
of assembly has both the frictional ratio (f/fo = 1.24; cal- 
culated from its sedimentation coefficient) and the 
intrinsic viscosity ([n] = 4.2 cm’ g’) of a globular protein 
and the molar mass, as determined by sedimentation 
equilibrium, of a monomer.’” Other polypeptides, how- 
ever, seem to be elongated but compact proteins under 
the conditions of assembly. For example, polypeptides 
S3, S5, S6, and S7 have frictional ratios f/fọ = 1.4-1.6 that 
are too large to be those of globular proteins (7773 Like 
subunit S4, subunit S3 is flattened against the surface of 
the assembled, intact 30S subunit. Subunit S5 also 
assumes a flattened, elongated conformation, but sub- 
units S6 and S7 are both globular in the final structure 
and must become so as they associate with the assem- 
bling particle. 
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Problem 13-7: The dissociation constant for the first 
step in the assembly of tryptophan synthase (Equation 
13-38) is 
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(A) Why is the 2 present in the numerator? 
The dissociation constant for the second step (Equation 
13-39) is 
[a] Lo." 


2[ a8," ] 


d2 


(B) Why is the 2 present in the denominator? 


(C) Calculate the equilibrium constants Ka and Ka at 
25 °C. Recall that at equilibrium 


kel@]}[B2] = kolap] 
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and 


k,[aß,] = ky[@ß,*] 
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There are two classes into which helical polymeric pro- 
teins can be divided when the question of assembly is 
considered: those that assemble irreversibly and those 
that assemble reversibly. Those that assemble irre- 
versibly polymerize initially by the noncovalent assem- 
bly of monomeric subunits, but the polymers are often 
strengthened secondarily by the formation of naturally 
occurring covalent cross-links between adjacent sub- 
units (Figure 3-19). Those that assemble reversibly 
require that reversibility for their biological role, and 
their assembly is not an approach to equilibrium but the 
result of a steady state. 

Examples of helical polymeric proteins that assem- 
ble irreversibly are collagen (Figure 9-34), intermediate 
filaments (Figure 9-35), thick filaments, and fibrin. 

The thick fibers of fibrin that form clots of blood are 
readily observed in scanning electron micrographs. They 
are lateral aggregates of thinner protofibrils of fibrin; 
these protofibrils of fibrin are assembled irreversibly 
from a protomeric unit known as fibrinogen (Figure 
13-22A),'’?8! which is a freely soluble (@By), heterohexa- 
mer? Each molecule of fibrinogen is constructed from 
two oy heterotrimers arrayed about a 2-fold rotational 
axis of symmetry. Each of the two aßy heterotrimers con- 
tains a long segment of rope (111 amino acids) in which 
the three strands, one from each of the a, ß, and 
ypolypeptides, are wound around each other in a triple 
a-helical coiled coil (Figure 6-28). At the two ends of each 
rope are globular domains known as the terminal 
domain and the central domain, respectively. A terminal 
domain is composed of the carboxy-terminal domains of 
the Bsubunit and the ysubunit; the central domain is 
composed of the folded amino-terminal regions of all 
three polypeptides. Two «aßy heterotrimers are associ- 
ated at their central domains around the 2-fold rotational 
axis of symmetry*®’*®’ to produce the molecule of fi- 
brinogen with two terminal domains. 

Fibrinogen does not assemble into a protofibril 
until four short peptides, two fibrinopeptides A and two 
fibrinopeptides B, are removed by the endopeptidase 
thrombin from the amino-terminal ends of the two 
a polypeptides and the two ß polypeptides to produce 
fibrin monomers. The fibrinopeptides have sequences 
that vary extensively among species (Problem 7-5) and 
seem to satisfy only the requirement that they be polar 
and structureless. Carboxy-terminal to the arginines at 
which the endopeptidolytic cleavages occur that remove 
the four fibrinopeptides, the amino acid sequences of the 
a and p polypeptides become highly conserved among 
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Figure 13-22: Assembly of fibrin from fibrinogen. (A) Skeletal 
drawing in stereo of the polypeptide backbones of the subunits in 
the crystallographic molecular model of fibrinogen from 
G. gallus.**' The molecule is an (of, heterohexamer with a rota- 
tional axis of symmetry normal to the plane of the page passing 
through the center of the central domain. Features noted on only 
one side of the molecule are reproduced symmetrically on the 
other. The amino termini of the six polypeptides a, a’, B, p’, y and 
y’ are all in the central domain, but the amino-terminal 26, 62, and 
3 aa are missing from the maps of electron density of the a, ß, and 
ypolypeptides, respectively, because these segments are disor- 
dered in the crystals. Two symmetrically displayed, identical rings 
of cystines (S-S rings) each connect Cysteine 045 to Cysteine 23, 
Cysteine yl9 to Cysteine 680, and Cysteine £76 to Cysteine «49. 
One of these rings is on each side of the central domain, and 
together they mark its boundaries. At these rings, the two symmet- 
rically displayed, three-stranded «-helical coiled coils commence 
in opposite directions, and each proceeds for 16 nm (111 aa) until 
terminating in another ring of cystines. In the middle of each coiled 
coil, there are intentional disruptions that make each susceptible 
to the endopeptidolytic cleavages that dissolve the fibrin clot. At 
each of the peripheral rings of cystines, each of the two terminal 
domains commences. The globularly and homologously folded 
carboxy-terminal 262 aa of a ßsubunit and 270 aa of a ysubunit 
together, side by side, form each terminal domain. The structures 
of these two halves of each of the peripheral domains are super- 
posable.*® The carboxy-terminal 16 amino acids of the y polypep- 
tide, which contain the sites of intermolecular cross-linking, are 
disordered in the crystals. The carboxy-termini of each of these 
polypeptides in the crystallographic molecular model (8COO™ and 
yCOO , respectively) are indicated. The carboxy-terminal 557 aa of 
each o polypeptide emerges from the respective peripheral ring of 
cystines, proceeds towards the center of the molecule along the 
a-helical coiled coil, and then disappears in disorder beyond 
Glutamate 218. There are oligosaccharides (CHO) at Asparagines 
y52 and Asparagines $363. The binding sites for the tetrapeptides 
GPRP and GHRP that were cocrystallized with the fibrinogen are 
indicated. These are mimics of the amino termini from within an 
a polypeptide (GPRIL-) and a ß polypeptide (AHRPL-) produced by 
thrombin cleavage. (B) Schematic drawing of the initial events in 
the polymerization of fibrin monomers to form a protofibril. 
Removal of the fibrinopeptides exposes new amino termini on the 
central domain (the four tails) that can then interact with comple- 
mentary sites on terminal domains of other molecules. An addi- 
tional set of contacts (end-to-end) comes into play upon addition 
of the third molecule. There are orthogonal 2-fold axes of symme- 
try (designated by arrow and sharpened ellipse) that alternate 
along the protofibril (polymer). Reprinted with permission from ref 
479. Copyright 1984 Annual Reviews Inc. (C) Electron micrographs 
of intermediates in the polymerization of fibrin monomers. 
Thrombin was added to a solution of bovine fibrinogen (0.3 mg 
mL") to initiate polymerization. At short times after nucleation of 
polymerization (about 10 min), the macroscopic clot was removed 
and samples of the clear solution that remained were placed on 
hydrophilic films of carbon supported by networks of formvar. The 
adsorbed complexes of fibrin monomers were negatively stained 
with 1.0% uranyl acetate.“ Magnification 290000x. The complexes 
of fibrin monomers presented in the gallery are (a) a dimer, (b) two 
dimers, (c) a tetramer, (d) a pentamer, (e) a hexamer, and (f) a hep- 
tamer. Reprinted with permission from ref 480. Copyright 1981 
Academic Press. 


species.“ The o polypeptide of a mammalian fibrin 
monomer has the amino-terminal sequence GPRAIk-, 
where Alk is the alkyl group of valine, leucine, or 
isoleucine; and the A polypeptide of a fibrin monomer 
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has the amino-terminal sequence GHRP-. A synthetic 
peptide of the sequence GPRP can inhibit completely the 
polymerization of fibrin monomers to fibrin polymer.*™* 
It does so by competing with the amino termini of the 
a polypeptides of those fibrin monomers, which are in 
the central domain, for a binding site on the globular, 
carboxy-terminal domain of a ysubunit,’®°° which 
together with the homologous globular, carboxy-termi- 
nal domain of a ß subunit*” forms one of the two termi- 
nal domains of the fibrin monomer (Figure 13-22A). 

These facts suggest that the protofibril is assembled 
by the noncovalent binding of the central domain and 
terminal domain of one molecule of the fibrin monomer 
to the terminal domain and central domain, respectively, 
of another fibrin monomer to form a rotationally sym- 
metric, doubly bonded dimer (Figure 13-22B).*” The 
dimer would be elongated to the polymer by adding 
other fibrin monomers, dimers, or oligomers through 
steps each creating identical interfaces (Figure 13-22B). 
Each of the consecutive, individual noncovalent interac- 
tions holding the helical polymer together is between the 
amino terminus of an o polypeptide on a central domain 
exposed by the cleavage produced by thrombin and the 
binding site for its amino-terminal sequence, GPR-, on 
one of the terminal domains of another fibrin monomer. 
Electron micrographs of intermediates in the polymer- 
ization of fibrin monomers are consistent with this struc- 
tural proposal (Figure 13-22C).** 

The binding site for the GPR- has been located in a 
crystallographic molecular model of the terminal 
domain.“ It is a hole on the surface of the globular car- 
boxy-terminal domain of the ysubunit into which the 
amino-terminal sequence GPR- inserts with the glycine 
at the bottom of the hole. The amino-terminal 26 amino 
acids of the a polypeptide, amino-terminal to the sym- 
metric interchain cystine between Cysteine a28 and 
Cysteine o’28, are missing from the crystallographic 
molecular models of both a detached central domat?" 
and a complete molecule of fibrinogen.“*' Consequently, 
the amino-terminal 11 aa of an @ polypeptide in a fibrin 
monomer are probably structureless, and as a result, 
they are able to reach out to the hole on the peripheral 
domain. The cystine between Cysteine 028 and Cysteine 
o’28 itself forms a loop that juts out from the central 
domain. Both of these structural characteristics make it 
easier for the GPR- to find its site on the terminal domain 
of another fibrin monomer and suggest that the connec- 
tion it forms between a central domain of the one fibrin 
monomer and the terminal domain of the other is a flex- 
ible tether. 

A fibrin monomer is a knotted segment of rope with 
a 2-fold rotational axis of symmetry centered on the knot. 
It assembles into a protofibril, which is a helical cable. An 
infinite helix defined by a smooth curve is a geometric 
structure with 2-fold rotational axes of symmetry inter- 
secting every one of its points, so the creation of an infi- 
nite helical polymer from a monomer with a molecular 
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2-fold axis of symmetry produces a structure with a 2- 
fold rotational axis of symmetry at each molecular 2-fold 
rotational axis of symmetry. When this polymer is finite, 
the two ends are necessarily identical in structure. 

The protofibril that forms as the assembly proceeds 
has a thickness, as determined by light scattering,“ of 
about two fibrin monomers (Figure 13-22B,C). When 
fibrin monomers are created instantly by adding a large 
excess of thrombin to a solution of fibrinogen, the initial 
rate of formation of these protofibrils as monitored by 
light scattering is bimolecular in the concentration of 
fibrin monomers (Figure 13-23A)"® and shows no evi- 
dence ofa lag. The time required for half of the final light 
scattering to be established is inversely proportional to 
the initial concentration of fibrin monomers (Figure 
13-23B). Because light scattering is not proportional to 
bulk concentration of polymer, this is simply the time 
required for a particular fraction o of the polymer to form. 

It can be shown!” that such behavior is that 
expected from a polymerization in which end-to-end 
connections among monomers, dimers, and oligomers 
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Figure 13-23: Dependence of the initial rate for the polymeriza- 
tion of fibrin (A) and the time required to reach half of the total 
increase in light scattering (B) on the initial concentration of fibrin 
monomer.“ Solutions of fibrinogen (0.05-0.35 mg mL” final con- 
centration) were mixed rapidly (<0.5 s) with solutions of thrombin 
at concentrations high enough to remove the fibrinopeptides from 
the molecules of fibrinogen immediately (<0.5 s). The polymeriza- 
tion of the fibrin was then followed by monitoring the increase in 
light scattering of the solution at 633 nm as a function of time (sec- 
onds). The initial rate of increase in light scattering was determined 
for each trace as well as the time (2-25 s) required for the light scat- 
tering to reach half its final value. The logarithm of the initial rate 
and the logarithm of the time required to reach half of the final 
value of light scattering, log fy,, are presented, respectively, as func- 
tions of the logarithm of the concentration (molar) of fibrinogen. 
The concentration of protein (micromolar) is also noted at the top 
of each graph. The polymerization was performed at 23 °C in 0.5 M 
NaCl. The high ionic strength prevented side-to-side aggregation of 
the polymers so that only formation and elongation of protofibrils 
were occurring during the measurements. The solid lines are linear 
least-squares fits to the data; the dashed lines are lines of slope 2 
and -1, respectively. Adapted with permission from ref 488. 
Copyright 1979 Journal of Biological Chemistry. 


form at random with no initiation required and in which 
each connection has the same rate constant of formation 
regardless of the lengths of the two participants, includ- 
ing unconnected monomers. It is possible that, during the 
polymerization of fibrin monomers, an interface forms 
between the two adjacent terminal domains of two dif- 
ferent fibrinogen molecules in a protofibril, which cannot 
form in the initial dimer (Figure 13-22B). Either this inter- 
face is much slower to form than the interface formed by 
the insertion of the amino terminus of an o subunit into 
the hole on a ysubunit or the difference in rate between 
the formation of a dimer and formation of a longer 
protofibril is irrelevant because almost all interfaces form 
between oligomers and between oligomers and 
monomers. 

If the assumption is made that the degree of poly- 
merization is equal to the molar concentration of con- 
nected interfaces between monomers, [interfaces], and it 
is realized that the molar concentration of amino termini 
of a polypeptides on central domains is always equal to 
the molar concentration of holes on terminal domains, 
then 


dlinterfaces] 


= klfaces]? 
dt 


(13-40) 


where [faces] is the total molar concentration at any 
time, t, of open faces, each of which is an amino termi- 
nus and a hole on the same monomer and each of which 
is located at the open end of a fibrin monomer, dimer, or 
oligomer (Figure 13-22B), and each open end has both 
an amino terminus and a hole. The initial rate of polymer 
formation will be second-order in the concentration of 
fibrin monomer because [faces], = 2[monomer],, where 
[faces], and [monomer], are the initial concentrations of 
faces and fibrin monomers, respectively. 

Because one interface is formed from two faces, one 
from each monomer (Figure 13-22B) 

d[interfaces | dlfaces] 
2 = - ——— (13-41) 

dt dt 


Upon combination of Equations 13-40 and 13-41 and 
rearrangement 


dlfaces] 
rees = 2k dt (13-42) 
Upon integration between t= 0 and t 
1 1 
= 2kt (13-43) 


[faces] [faces], 


Choose any time, tẹ at which a particular fraction « of 
the polymer has formed. Because that fraction of the 


faces has disappeared, [faces] = (1-a) [faces]. When this 
equality is inserted into Equation 13-43 


1 _4(l-a 
[faces], 2( a In, (13-44) 


It follows that the time at which any particular fraction, 
a, of the polymer has formed will be inversely propor- 
tional to the initial concentration of monomer, as was 
observed. The kinetic mechanism just described, based 
on the assumption that the combination of two faces is 
independent of whether they are faces on fibrin 
monomers, dimers, or oligomers, is completely consis- 
tent with the proposed molecular mechanism (Figure 
13-22B). 

The final step in the irreversible polymerization of 
fibrin is the covalent cross-linking of the fibrin 
monomers among themselves by the enzyme protein- 
glutamine y-glutamyltransferase. This enzyme catalyzes 
replacement of the ammonia in a glutamine on one 
monomer with the camine of a lysine on another 
monomer: 


® H 

O NH2 H3N O N 
SE ee 
(13-45) 


Two symmetrically disposed pairs of lysines and gluta- 
mines near the carboxy termini of two ysubunits are 
cross-linked in this way to produce a covalent dimer of 
y polypeptides with the following structure: 
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where each valine is the carboxy terminus of a y polypep- 
tide. The covalently symmetric juxtaposition of these two 
sequences from the ypolypeptides of different fibrin 
monomers in the protofibril produced by the protein- 
glutamine y-glutamyltransferase is consistent with a 
structural model in which there is a 2-fold rotational axis 
of symmetry at this location in the protofibril. Amide 
cross-links are also formed among o polypeptides juxta- 
posed during the lateral association of the protofibrils.*”° 
The pairing in these cross-links reflects the side-by-side 
orientation of the œ polypeptides in different protofibrils 
in the final fiber of fibrin. 

After it has polymerized and been covalently cross- 
linked, fibrin cannot depolymerize. The fibrin clot is 
eliminated when necessary by the endopeptidolytic 
cleavage of the three polypeptides within the o helical 
coiled coils (Figure 13-22A) by plasmin. Collagen, also 
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because it is covalently cross-linked, cannot depolymer- 
ize. It too is eliminated, when necessary, by endopepti- 
dolytic digestion. These two proteins are examples of 
helical polymeric proteins that are assembled irre- 
versibly. Examples of helical polymeric proteins that 
assemble reversibly are actin (Figure 9-1B) and micro- 
tubules. These helical polymeric proteins are continu- 
ously assembled and disassembled during the life of a 
cell. 

Microtubules are hollow cylinders of indefinite 
length. The overall radius of a microtubule is about 
12 nm, and it has a hollow center with a radius of about 
6 nm. When viewed in the electron microscope, by neg- 
ative staining, microtubules are tubular bundles of 10-16 
indistinguishable rows of protein (Figure 13-24A).°!"% 
Each row is parallel to all of the others and parallel to the 
axis of the microtubule for microtubules of 13 rows but 
skewed somewhat relative to the axis for microtubules 
with 10, 11, 12, 14, 15, or 16 rows.*”” At the end of a micro- 
tubule these rows can be frayed into individual 
threads.“ From this observation it can be concluded 
that the interfaces forming the rows are stronger than the 
interfaces between them.” Each thread of protein form- 
ing one of these rows is a protofilament.*”” 

Each of the protofilaments in an intact microtubule 
is a string of globular protein subunits. Each of the sub- 
units is pointed in the same direction. The top of one of 
these subunits is joined to the bottom of the subunit 
above it in the same protofilament by an interface. Each 
subunit is related to the one above it by a screw axis of 
symmetry coincident with the axis of the microtubule. 
Because each of the protofilaments in a microtubule of 
13 rows is parallel to the axis of the tubule, the angle 
relating two consecutive subunits by the screw axis of 
symmetry is zero in this type of microtubule. In the other 
types of microtubules in which the protofilaments are 
skewed relative to the central axis, the angles of the screw 
axes are somewhat less or somewhat more than zero. In 
each case, however, the translation along the screw axis 
of symmetry that superposes a subunit in a protofila- 
ment upon the one above it is 4.0 nm. 

The primordial microtubule was a polymer com- 
posed of identical monomeric subunits each related to 
its neighbor above by the screw axis of symmetry defined 
by the strong interfaces creating a protofilament. At 
some point, the gene encoding the subunit duplicated 
and the two resulting isoforms of the common ancestral 
subunit began to evolve separately. The results of this 
evolution are that along a protofilament, the two iso- 
forms, the æ subunit and the p subunit, alternate; that 
the interface between a ßsubunit and the œ subunit 
above it is stronger than the interface between a £ sub- 
unit and the œ subunit below it; and that as a result, 
when a microtubule dissociates, it dissociates into 
aß heterodimers.””**” Because the individual identical 
subunits in the primordial microtubule were related to 
each other by a screw axis of symmetry with a rise of 


722 Folding and Assembly 


4.0 nm and an angle of rotation equal to about zero, the pointed in the same direction. ®®™ This off heterodimer 
two subunits in the af heterodimer are now related to is tubulin, the monomer from which a microtubule is 
each other by a screw axis of pseudosymmetry with the formed by its polymerization. The of heterodimer of 
same rise and a rotation of almost zero. tubulin in free solution will be referred to as monomeric 
The protomer from which a microtubule is now tubulin; of heterodimers of tubulin within a micro- 
formed is this æf heterodimer. It is composed of an tubule will be referred to as protomeric tubulin. 
a subunit (n,a = 450)” and a B subunit (naa = 445)” the Upon the microtubule there is a helical surface lat- 
amino acid sequences of which are homologous to each tice." This lattice was originally defined by image recon- 
other [41% identity with 1.1 gaps (100 aa)”'].“” Their struction of electron micrographic images of intact 
native structures are also homologous and superposable microtubules formed from 13 rows (Figure L3.-24A) TI 
and indistinguishable at low resolution. In the The reciprocal lattice of the Fourier transform of digi- 
aß heterodimer, they sit one on top of the other, each tized images from electron micrographs was assigned to 
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Figure 13-24: Helical surface lattice of of heterodimers of tubulin producing a microtubule.” Trichonympha agilis, flagellated protists from 
the gut of the termite Zootermopsis angusticollis, were mixed with 1% phosphotungstate, pH 7, and applied to a film of amorphous carbon 
supported by a network of collodion. The excess negative stain was drained, the preparation was dried, and the negatively stained specimens 
were examined in the electron microscope. (A) Although the flagella of the Trichonympha, over most of their length, are composed of pairs 
of microtubules in which two microtubules are fused to each other, at the distal end of a flagellum these pairs dwindle to single microtubules. 
The electron micrograph is of a group of these individual, unpaired microtubules. Regions marked F, H, and A were chosen for image 
enhancement. The optical density of each of these images was digitized with an optical densitometer, and the Fourier transform of the dig- 
itized image was calculated. From examination of this calculated Fourier transform and from optical diffraction patterns of the images them- 
selves, the reflections arising from the helical array of protomers were identified and indexed. These reflections defined the helical lattice in 
which the of heterodimers of tubulin are arrayed in a microtubule. (B) A schematic diagram of that lattice is presented. The off dimers of 
tubulin are aligned in 13 protofilaments parallel to the axis of the cylinder (see panel A). The cylinder on which the lattice is arrayed was cut 
along one of the lattice lines between two protofilaments on the side of the microtubule opposite to the seam and parallel to its axis, and the 
cylindrical surface was flattened onto the page. Individual subunits, if no distinction is made between g and f, lie on a triply threaded, left- 
handed screw (n=-3) and a decuply threaded, right-handed screw (n = 10). The two different unit cells arrayed along the resulting helices, 
respectively, are indicated by parallelograms. af Heterodimers lie on a pentuply threaded right-handed screw (n=5) and an octuply threaded 
left-handed screw (n=-8). Each unit cell along each of the resulting helices is identical and contains the equivalent of two of heterodimers. 
The dimensions of each of these helices and of the parallel protofilaments are noted in nanometers. The horizontal dashed lines indicate the 
points of fusion for the flattened array when it is rolled into the cylinder. The heterodimers are in register in the adjacent protofilaments 
except at the seam. The locations of the binding sites for GTP on the f subunits are indicated by rectangles. The sites for the nonexchange- 
able GTP on the asubunits sit between the two subunits in the heterodimer. The end of the microtubule displaying only £ subunits is the 
plus end; that displaying only o subunits is the minus end. Reprinted with permission from ref 491. Copyright 1974 Biochemical Society. 


that of the Fourier transform of a triply threaded, left- 
handed (n = -3) screw the three helices of which are 
spaced at 4.0-nm intervals along the axis (Figure 13-24B). 
Because the array is built upon the 13 parallel protofila- 
ments, the three left-handed helices (n =-3) of subunits 
with spacing of 4.0 nm create 10 right-handed helices (n 
= 10) with spacing of 4.0 nm, and together these sets of 
helices form a (-3,10) lattice. These sets of three and ten 
parallel, contiguous helices are helices of unit cells each 
of which contains the equivalent of one individual sub- 
unit, where o subunits are indistinguishable from £ sub- 
units, but because o subunits and D subunits actually are 
different from each other, there are two different but 
indistinguishable unit cells arrayed translationally along 
each of these sets of helices. A set of (8.0 nm)” reflections 
also appears in the Fourier transform of the images, and 
these reflections result from an octuply threaded, left- 
handed (n = -8) screw and the resulting pentuply 
threaded screw (n = 5) with spacing of 8.0 nm between 
the threads, a (-8,5) lattice. In these sets of eight and five, 
the helices are helices of identical, translationally arrayed 
unit cells, each containing the equivalent of two 
aß heterodimers. 

Because an a subunit is different from a D subunit 
and because they sit one on top of the other in the 
aß heterodimer, the two ends of a microtubule are dif- 
ferent, and a microtubule is polar. This has been verified 
by the observation that growth at one end of a micro- 
tubule during assembly is slower than growth at the 
other end.’ In the arrangement depicted in Figure 
13-24B, one end would display only o subunits; and the 
other end, only D subunits. 

The most peculiar structural feature of an extant 
microtubule, which was not a feature of the primordial 
microtubule, is its seam (Figure 13-24B). In all micro- 
tubules except those with 10 and 16 protofilaments, the 
protofilaments are in register, œ subunit next to œ sub- 
unit and p subunit next to ßsubunit, except at one junc- 
tion, at which they are out of register. This junction is a 
seam of discontinuity that runs parallel or almost paral- 
lel to the axis of the microtubule along its surface.” At 
the seam, there is a discontinuity in the (-8,5) lattice of 
aß heterodimers but not in the (-3,10) lattice of indistin- 
guishable subunits. 

When a solution of purified monomeric tubulin is 
brought to the proper conditions of temperature and pH 
and is mixed with the proper substrates, the tubulin 
spontaneously polymerizes to form microtubules. The 
process can be divided into two phases, nucleation and 
elongation.” Nucleation is the sequence of events that 
leads to the formation of a large enough oligomer of 
tubulin to act as an origin from which a microtubule can 
then elongate by the consecutive, repetitive addition of 
more tubulin. Elongation is the addition of tubulin to 
one end or the other of a microtubule in such a way that 
each successive addition at that end is formally equiva- 
lent regardless of the length of the microtubule. Until 
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such a stage is reached, the steps in the assembly of a 
microtubule are steps in the process of nucleation. 

Under all experimental conditions, spontaneous 
nucleation of microtubules in an originally monodis- 
perse solution of tubulin is a complicated process that 
involves a number of intermediates of peculiar struc- 
ture.°050 Tt also depends on the twelfth power of the 
concentration of tubulin.” This is not surprising 
because the steps between monodisperse tubulin in free 
solution and an oligomer of tubulin large enough to offer 
the end of a cylinder of 10-16 rows equivalent to the end 
of the cylinder in an established microtubule are not easy 
to accomplish. Furthermore, even though it occurs with 
even the most highly purified preparations of tubulin, 
spontaneous nucleation seems to involve a minor com- 
ponent contaminating the preparation of purified tubu- 
lin.’ At concentrations of tubulin high enough to 
support spontaneous nucleation, it proceeds slowly and 
continuously, independent of any decreases in the con- 
centration of free tubulin caused by its incorporation 
into elongating microtubules. All of these properties 
make it difficult to separate cleanly the kinetics of spon- 
taneous nucleation from those of elongation. 

These complexities of spontaneous nucleation in 
the laboratory, however, may be irrelevant to the poly- 
merization of tubulin within a cell, the process for which 
the protein has evolved. In a living cell almost all of the 
microtubules originate in only one region of the cyto- 
plasm.” In cells containing centrosomes, it is the peri- 
centriolar material that serves as the origin. The 
pericentriolar material is a diffuse structure surrounding 
the centriole that lies in the center of the centrosome. In 
cells lacking centrosomes, the point of origin is associ- 
ated with structures resembling centrosomes. Within the 
pericentriolar material, it is rings of a different isoform of 
tubulin, y tubulin, that serve as the nuclei upon which 
aß tubulin polymerizes.°"! The rings of y tubulin embed- 
ded in purified centrosomes are able to initiate the for- 
mation of microtubules readily” and do so at lower 
concentrations of free tubulin than are necessary for 
spontaneous nucleation.” From these observations, 
it becomes clear that spontaneous nucleation in the 
absence of centrosomes is an adventitious process for 
which off tubulin was probably not designed. 

Elongation of microtubules in the absence of the 
complications of spontaneous nucleation can also be 
accomplished by adding seeds, which are short, uniform 
fragments of preformed microtubules, to solutions con- 
taining high enough concentrations of unpolymerized 
monomeric tubulin to support the elongation of those 
seeds.*” Seeds have already passed through the steps of 
spontaneous nucleation. The fragments of preformed 
microtubules used as seeds can be stabilized by cross- 
linking with ethylene glycol bis(succinimidylsucci- 
nate)? or by preparing the seeds with guanylyl 
5’-(B, ymethylenediphosphonate). Solutions of tubulin 
of high purity are reasonably unsusceptible to sponta- 
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neous initiation’!° but readily support the elongation of 
seeds in a reaction that assumes its maximum rate 
immediately after the seeds are added to the solution. 

Consider a polymerization in which monomers, such 
as monomeric tubulin, are successively adding to only one 
end ofa polymer, such as a microtubule elongating at only 
one of its ends from a centrosome, by the reaction 


kn 


polymer, + monomer polymer,,,; (13-46) 


=N 


where polymer, is a polymer composed of n protomeric 
units and polymer,,; is a polymer composed of (n + 1) 
protomeric units. When the reaction has come to equi- 
librium, the dissociation constant, Kan, poly, of a monomer 
from the end of a polymer n + 1 units in length is 


[polymer] eq [monomer], k 


K, = 
dn,poly 
[polymer ps1] eq n 


(13-47) 


When all values of n are large enough so that only elon- 
gation is being considered, Kan poly, Ku and k_,, respec- 
tively, have the same mean values for all values of n. The 
ends of all the polymers are indistinguishable because 
each is elongating from the same one of its two ends. In 
the case of microtubules elongating from a centrosome, 
the other end is not elongating because it is anchored in 
the centrosome. 

Two measures of the solution of a polymer can be 
separately defined. The bulk concentration of polymer, 
[polymer],, or bulk concentration of microtubules, 
[microtubule],, is equal to the molar concentration of 
protomers, or of protomeric tubulin, that are incorpo- 
rated into polymers. The bulk concentration is directly 
proportional to the total length of polymer in the solu- 
tion. In the case of microtubules the bulk concentration 
is usually determined by light scattering, which is linearly 
related to the total length of microtubule present. 
Normally, a significant fraction of the light is scattered by 
these solutions, and the absorbance, which is decreased 
by the amount of light scattered, is measured rather than 
the amplitude of the scattered light. The number con- 
centration of polymer, [polymer],, or number concen- 
tration of microtubules, [microtubule], is equal to the 
molar concentration of individual, intact molecules of 
polymer, or microtubules, regardless of their individual 
lengths. The number concentration of microtubules is 
measured by quantitative electron microscopy.’'” 
Individual microtubules in a field on an electron micro- 
graph are counted, and their density is related to their 
density in the original solution. The number concentra- 
tion of a polymer elongating at only one end is equal to 
the molar concentration of that end, [end]. 

Assume still that elongation is occurring at only one 
end of a polymer and that 


d[polymer], 


T = k„[end] [monomer] - k_, [end] 


(13-48) 


If the reaction has been initiated by adding origins of 
nucleation, such as centrosomes, to a solution of 
monomer, and if no spontaneous nucleation occurs, the 
molar concentration of ends at which elongation is pro- 
ceeding must remain constant, and the initial rate for the 
formation of bulk polymer is defined by the relationship 


d[polymer],, 
dt 


| = [end], (x, [monomer], u Koz) 
0 
(13-49) 


where the subscripts indicate initial quantities. It has 
been shown, in agreement with this equation, that the 
initial rate at which the bulk concentration of micro- 
tubules increases is directly proportional to the molar 
concentration of seeds added to a series of reactions 
(Figure 13=35).°® 

The initial rate of formation of bulk polymer at only 
one end should also be directly proportional to the term 
(k„Imonomer], — k-n), and a plot of initial rate against ini- 
tial concentration of monomer should be linear and pass 
through zero when k,[monomer]) = k-n. Immediately 
after addition of seeds, if the seeds were simply frag- 
ments of polymer equivalent to the polymer about to be 
formed by elongation, the bulk concentration of polymer 
[polymer], should increase when k,[monomer] > k-n and 
decrease when k,[monomer]) < k-n. When the bulk con- 
centration of polymer is decreasing rather than increas- 
ing, the individual molecules of the polymer forming the 
seeds would be depolymerizing by shedding monomer 
from their ends and decreasing in length. The critical 
concentration is the initial concentration of monomer at 
which neither net elongation nor net depolymerization 
occurs. If only one molecule of polymer were being 
observed, its initial rate of elongation (in monomers 
second”) at only one of its ends should be equal to 
(k,{monomer], - k-n). When k,[monomer]) > k-n that 
molecule of polymer should elongate; and when 
k, [monomer], < k-n, it should depolymerize. 

If monomer adds to a population of elongating 
polymers until the concentration of monomer free in 
solution and the concentration of monomer within poly- 
mers reach equilibrium with each other, the total bulk 
concentration of polymer will be linearly related to the 
total concentration of monomer, and the line relating 
these two variables will intersect the abscissa at the value 
of the critical concentration (Figure 13-26).°'° This 
behavior necessarily follows from the facts that the 
amount of monomer incorporated into polymer when 
equilibrium is reached is equal to [monomer], - 
[monomer],,; that [monomer],,, the critical concentra- 
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Figure 13-25: Initial rate of tubulin polymerization as a function 
of the initial concentration of fragmented whole microtubules.” A 
preparation of microtubules (70 uM) that had been polymerized 
separately by spontaneous nucleation was sheared by passing it 
through a 22-gauge needle to produce fragments about 1 um in 
length referred to as seeds. The seeds were immediately added to a 
solution of unpolymerized monomeric tubulin at a concentration 
high enough (18 uM) to elongate the seeds. The solution was 
0.1 mM MgCl, and 0.5 mM GTP, pH 6.9. The elongation of the frag- 
ments was followed by an increase in absorbance at 320 nm due to 
light scattering. The initial rate of increase in absorbance (A329 
second”) is plotted against the initial number concentration 
(nanomolar) of sheared microtubules added to initiate elongation. 
Note that the initial molar concentration of monomeric tubulin is 
always more than 2000 times that of the seeds. Adapted with per- 
mission from ref 505. Copyright 1977 Academic Press. 


tion, is the same for each point; and that when 
[monomer] eq equals the critical concentration, no 
increase in [polymer], can occur. 

Seeds were added to a series of solutions containing 
increasing concentrations of unpolymerized, monodis- 
perse monomeric tubulin at concentrations below those 
at which significant spontaneous nucleation occurs.” At 
various times after the addition, samples were taken 
from each mixture. They were quenched by quantitative 
cross-linking with glutaraldehyde, sedimented onto a 
specimen grid, and examined in the electron micro- 
scope.“ The seeds that were used were fragments of 
axonemes, which are naturally occurring, rigid bundles 
of microtubules the polarity of which can be determined 
visually. The two ends of a fragment of an axoneme are 
arbitrarily termed the plus end and the minus end. The 
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Figure 13-26: Demonstration of the critical concentration for the 
polymerization of the monomer of cell division protein FtsZ from 
E. coli,” a bacterial homologue of tubulin.*****! Solutions con- 
taining the noted total concentrations of cell division protein FtsZ, 
which had been stripped of all nucleotides, were prepared in 
50 mM KCl and 10 mM MgCl, pH 6.5 at 55 °C. After equilibrium 
was reached, polymer was collected by centrifugation, and its bulk 
concentration was determined by direct analysis of the amount of 
protein in the pellet. The bulk concentration (micromolar) of sedi- 
mentable polymer ([polymerized FtsZ], is presented as a func- 
tion of the total concentration (micromolar) of monomer in the 
solution (Presi), The observed critical concentration is 3.4 uM. 


plus end and minus end were originally defined in terms 
of the appearance of an axoneme in an electron micro- 
graph. It is now known that the end of a microtubule at 
the plus end of an axoneme displays only ß subunits?!°°”” 
and the end at the minus end displays only o subunits 
(Figure 13-24B).°'® Elongation was found to proceed 
from both ends of the axonemes. The initial rate of 
increase in the length of the microtubules projecting 
from each end of the axoneme could be measured and 
plotted against the initial concentration of tubulin 
(Figure 13-27).°'* The initial rate of elongation is linearly 
related to the initial concentration of monomeric tubu- 
lin, [monomer] , as predicted by Equation 13-49, but k-n 
is too small to be estimated accurately.* 

When elongation occurs because seeds are added 
to a solution of monomer above the critical concentra- 
tion, the free concentration of monomer should decrease 
as polymer is formed until equilibrium is reached and 
further elongation ceases. If elongation were occurring 
from only one end of the polymer, from Equation 13-48 
it follows that 


* At rates of elongation around 10 um min”, a rate beyond those 
observed in Figure 13-26, there is a change in the rate-limiting step 
of elongation, and the rate becomes independent of the concen- 
tration of monomeric tubulin,” presumably because it is limited 
by the rate of some step that must occur between the addition of 
the last monomer of tubulin and the addition of the next monomer 
of tubulin to the growing end. 
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d[polymer],, d[monomer] 


dt dt 


[end] (E, [monomer] - k_,) 


(13-50) 


When this relationship is rearranged and integrated 
between t=0 and t 


In (k,, [monomer] - Esc = 


In(k, [monomer], - E.) — k,,[end]t 


(13-51) 


when t = oc and [monomer] = [monomer] eg 
k„lmonomer]., = k-n. Therefore, the concentration of 
monomer at equilibrium should be equal to the critical 
concentration at which no elongation occurs when seeds 
are added to a solution of monomer. 

But a microtubule has two ends, and both are elon- 
gating in the experiment. Consider any linear polymer 
such as a microtubule in which an arrangement of pro- 
tomers, such as the 13 protomers of tubulin across the 
microtubule that are labeled o and ß in Figure 13-24B, 
repeats precisely along the polymer to create its struc- 
ture. Remove a complete set of these protomers, for 
example the 13 protomers of tubulin at the minus end, 
from one end of the polymer, and add them to the other 
end in such a way that the newly added protomers dupli- 
cate the arrangement that was at that end before. For 
example, add one of these monomers of tubulin to each 
of the 13 protofilaments at the plus end. The structure of 
the altered polymer is identical to that of the initial poly- 
mer because of the linear repeat with which it is created. 
Because free energy is a state function, the standard free 
energy change for this reaction must be zero. Because 
this is true for whatever structure is present at the two 
ends initially, the mean value for the dissociation con- 
stants, Kan, for tubulin at either end of a microtubule 
must be the same. Because Kg jpn = K-n/kn = [tubulin] eq 
(Equation 13-47), the concentration of monomeric tubu- 
lin in equilibrium with either end, the critical concentra- 
tion, is the same. If only mass action were governing the 
reaction, a microtubule on average could not elongate 
preferentially at one end while it is depolymerizing at the 
other. The two ends, however, can, and do, have differ- 
ent rate constants of elongation (Figure 13-27) and 
depolymerization. 

One of the ingredients that is essential for the elon- 
gation of microtubules is GTP.°'” Either GTP or GDP is 
bound by monomeric tubulin at one site on each of the 
two homologous subunits in the off heterodimer.””” In 
monomeric tubulin, the site for binding GTP on the sur- 
face of the o subunit is enclosed within the interface of the 
aß heterodimer. Consequently, because of the screw axis 


Microtubular growth (um min-1) 
on 


-15 


Figure 13-27: Rate of elongation (upper quadrant; O, m, O, e 
micrometers minute”) and rate of depolymerization (lower quad- 
rant; @, O; -micrometers minute”) of microtubules from the plus 
end (@, m) and the minus end (O, O) of fragments of axonemes.°"4 
Fragments of axonemes from Tetrahymena pyriformis were added 
to final number concentrations of 10’mL to solutions of 
monomeric tubulin at the noted concentrations (micromolar). 
Each of these solutions was 1 mM MgCl, 1 mM EDTA, and 1 mM 
GTP, pH 6.8. At a series of time points, samples were withdrawn 
from these solutions and rapidly fixed with glutaraldehyde, and the 
products were sedimented onto electron microscopic grids or glass 
coverslips. The elongated axonemes were examined, following 
negative staining, in the electron microscope or, following staining 
with fluorescent anti-tubulin immunoglobulin G, in a fluorescence 
microscope (inset). The boundary between the original axoneme 
(white bar in inset) and the newly elongated microtubules was 
clearly defined because the newly elongated tubules splayed from 
the rigid cylinder of the bundle of microtubules making up the 
axoneme. Rates of elongation were calculated from direct meas- 
urements of the length (micrometers) of the newly elongated 
microtubules elongating from the ends of the axonemes as a func- 
tion of time (minutes). The circles (O, @) in the upper quadrant of 
the graph are measurements made by electron microscopy, and 
the squares (CL, W) are measurements made by immunofluores- 
cence. At concentrations of monomeric tubulin greater than 3 uM, 
microtubules would elongate from axonemes, and the rate of elon- 
gation was linearly related to the concentration of initial unpoly- 
merized monomeric tubulin. At initial concentrations of 
monomeric tubulin below 3 uM, no elongation would occur. If 
microtubules, however, were grown on axonemes to 20 um on the 
plus end and 7.5 um on the minus end at a high monomeric tubu- 
lin concentration and the concentration of monomeric tubulin was 
then dropped to one of the noted concentrations less than 3 uM, 
the microtubules would begin to depolymerize at the noted rates 
(circles with minus values of rates in the lower quadrant of the 
graph), which were independent of the concentration of unpoly- 
merized tubulin. Reprinted with permission from Nature, ref 514. 
Copyright 1984 Macmillan Magazines Limited. 


of pseudosymmetry, the site for binding GTP on the sur- 
face of the p subunit is on the opposite side of the 8 sub- 
unit from that interface, but at a position (indicated by the 
rectangles in Figure 13-24B) that becomes enclosed within 
the interface that is formed in a protofilament when a 
p subunit enters the elongating tubule at the minus end. 
At the positive end of a microtubule, each £ subunit has a 
site for binding GTP that is displayed at its open end and 


that is enclosed in the interface formed when monomeric 
tubulin associates with it during elongation at that end. A 
molecule of GTP bound to the œ subunit of monomeric 
tubulin, which is within the interface that is stable and 
does not dissociate, remains unhydrolyzed, does not 
exchange with GTP in solution, and is an inert feature of 
the protein.” It is the descendant of the GTP molecule 
that was at a dissociable interface in the primordial micro- 
tubule but now serves only a structural role uninvolved in 
the dynamics of elongation and depolymerization of an 
extant microtubule. Consequently, only the GTP or GDP 
and inorganic phosphate bound to the site on the £ sub- 
unitis relevant to the roles oftubulin in the dynamic behav- 
ior of a microtubule. 

At the open site on the p subunit in monomeric 
tubulin, the bound GTP exchanges readily with GTP or 
GDP in the solution, and molecules of bound GTP are 
slowly hydrolyzed to GDP and inorganic phosphate at 
that site but only after the incorporation of this site into 
a microtubule coincident with the addition of the 
monomeric GTP-tubulin to an elongating micro- 
tubule.” At 37°C, the rate of this hydrolysis within a 
microtubule is 0.06 s, and the rate for release of the 
inorganic phosphate into the solution is 0.02 s7.° The 
molecules of GDP that are formed by this hydrolysis, 
however, remain trapped at the site while the protomeric 
GDP-tubulin is within the microtubule.” This pecu- 
liar feature of elongation causes an elongating micro- 
tubule to have a region at its end in which all the tubulin 
has GTP bound to it because it has recently been added. 
Beyond this cap of protomeric GTP-tubulin, however, 
the frequency of protomeric GDP-tubulin increases until 
the main body of the microtubule is reached, which is 
entirely formed from protomeric GDP-tubulin. In the 
cytoplasm of a cell, in which the concentration of GTP is 
much greater than that of GDP, when a protomer of 
GDP-tubulin leaves a microtubule, the GDP rapidly dis- 
sociates from it and is replaced by GTP from the solution. 

Although there is one measurement” giving a crit- 
ical concentration for GDP-tubulin of 3 uM, which is con- 
tradicted by measurements from a similar preparation of 
tubulin? giving a critical concentration of greater than 
30 uM, it is generally believed that “the critical concen- 
tration for assembly of Tu-GDP is apparently very high, 
effectively infinite for practical purposes.”” The critical 
concentration for (GTP)-tubulin, however, is immeasur- 
ably small. This conclusion follows from the fact that the 
intercepts with the horizontal axis in Figure 13-27 are 
indistinguishable from zero and the fact that the meas- 
ured critical concentration for tubulin to which guanylyl 
5’-(B, y-methylenediphosphonate) has been bound, 
which is an analogue of GTP that is very slowly 
hydrolyzed, is less than 0.2 uM.”° Unfortunately, 
although the difference in critical concentrations 
between GDP-tubulin and GTP-tubulin is probably large, 
no direct measurement of that difference is available. 

The rate constants for elongation of a microtubule 
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can be distinguished separately by the following conven- 
tion. Those for elongation at the positive end can be 
identified with a plus sign (+); those for elongation at the 
negative end, with a minus sign (-); those for GTP-tubu- 
lin, with the letter T; and those for GDP-tubulin, with the 
letter D. From the results in Figure 13-27, Kan, = 3.8 x 
10° M” s™ and k,7_= 1.2 x 10° M's" at 37 °C. Both kr, 
and Kr are too small (< 1 sl to be measured accu- 
rately.” Under most circumstances in which elongation 
occurs in the presence of GTP ([monomer] > 4 um), it is 
governed by k,,7, and En, Even though the dissociation 
constants for GTP-tubulin to both ends of the micro- 
tubule must be the same, the rate constants for associa- 
tion and dissociation are not the same because the 
microtubule is a polar structure. It happens that a micro- 
tubule elongates about 3-fold more rapidly from its plus 
end than from its minus end. 

When microtubules depolymerize, for example, 
upon dilution, the cap of protomeric GTP-tubulin near 
the end that was formed from recently added monomeric 
GTP-tubulin is rapidly lost, and depolymerization then 
proceeds by the dissociation of monomers of GDP-tubu- 
lin. If the dilution has been great enough, ends cannot be 
recapped and only En, and knp- govern the rates of 
depolymerization. The values for these observed rate 
constants for depolymerization upon dilution (Figure 
13-27),°'* under the same conditions in which Kr, and 
Kr were measured, are k_;,p,=340 sand k_„n-=210 s7! 
at 37 °C. 

This arrangement of reactions—elongation by 
GTP-tubulin, hydrolysis of the GTP within the micro- 
tubule, and the consequent dissociation of mainly 
GDP-tubulin upon depolymerization—creates a peculiar 
steady state when the concentration of bulk polymer has 
reached its maximum level. At all times and at random, 
regions of protomeric GDP-tubulin are overtaking the 
now slowly elongating ends of individual microtubules 
and switching those microtubules from ones that are 
elongating to ones that are catastrophically depolymeriz- 
ing (Figure 13-28).°'* The monomeric tubulin released 
during depolymerization picks up new molecules of GTP 
and reenters a microtubule that happens by chance still 
to be outdistancing its protomeric GDP- tubulin. 

If the concentration of monomeric GTP-tubulin is 
less than a certain apparent critical concentration, the 
boundary of protomeric GDP-tubulin, whose rate of 
propagation is independent of the concentration of 
monomeric GTP-tubulin, will move along the micro- 
tubule faster than the rate of its elongation, which is 
dependent on the concentration of monomeric 
GTP-tubulin, and the microtubules will be switched 
(Figure 13-28) and begin to depolymerize with the rate 
constants k_„p, and k_np-. At concentrations of tubulin in 
excess of this apparent critical concentration, elongation 
will be faster than the rate at which the boundary of pro- 
tomeric GDP-tubulin can move, and the microtubules 
will elongate with rate constants k,,7, and Eur. as if they 
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Figure 13-28: Schematic model describing the role of GTP in the 
elongation of microtubules.”'* The dark circles represent pro- 
tomers of tubulin to which GTP is bound, and the open circles rep- 
resent protomers of tubulin on which the GTP has hydrolyzed to 
GDP. As long as the tubule is elongating at a significant rate, the 
end of the microtubule is occupied by protomeric GTP-tubulin, 
and the end has a low critical concentration (Figure 13-27). In this 
state, it will not depolymerize catastrophically, and this is the grow- 
ing phase. If elongation slows down because the concentration of 
monomeric GTP-tubulin decreases, at a certain point the spreading 
boundary of GTP hydrolysis will reach the end of the microtubule. 
The end of the microtubule will then be occupied mostly by pro- 
tomeric GDP-tubulin and the end will then have a much higher crit- 
ical concentration. It has passed through a phase transition into 
the shrinking phase and will rapidly depolymerize. Reprinted with 
permission from Nature, ref 514. Copyright 1984 Macmillan 
Magazines Limited. 


contained only GTP-tubulin. At the apparent critical con- 
centration, the rate of depolymerization from uncapped 
ends will be equal in magnitude to the rate of elongation 
from capped ends. This apparent critical concentration 
is not the reflection of an equilibrium process, as is the 
actual critical concentration (Figure 13-26), but the 
result of a complex combination of rate constants pro- 
ducing a steady state, and it should not be confused with 
a real critical concentration. The steady state is main- 
tained by the continuous hydrolysis of GTP, and as long 
as GTP is available, equilibrium cannot be reached. 
When all the GTP has been hydrolyzed, all of the micro- 
tubules disappear because the critical concentration for 
GDP-tubulin is so large. 

At concentrations of 10 uM free monomeric tubu- 
lin, which is the apparent critical concentration of 
GTP-tubulin at steady state in the presence of excess con- 
centrations of GTP°” rather than at equilibrium,°'” the 
rates of depolymerization from uncapped ends contain- 
ing protomeric GDP-tubulin are at least 10-fold greater 
than the rates of elongation from ends capped with pro- 
tomeric GTP-tubulin. As a result, a microtubule that is 
depolymerizing from an uncapped end will rapidly and 
catastrophically disappear even if it is elongating at its 
other end. Catastrophic depolymerization of micro- 
tubules following dilution proceeds as a zero-order reac- 
tion (Figures 13-27 and 13-29)°”'* because the rate of 
depolymerization is defined by the equation 
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Figure 13-29: Kinetics of the depolymerization of microtubules." 
Tubulin (35 uM) was polymerized at 30 °C in 0.1 mM MgCl, and 
0.5 mM GTP at pH 6.9 to steady state (30 min). When the tempera- 
ture of a solution of microtubules is dropped, the microtubules 
depolymerize. When the temperature of this solution was brought 
to 5 °C, the depolymerization could be followed by the decrease in 
the absorbance at 320 nm. After a lag coinciding with the time nec- 
essary to lower the temperature, the depolymerization followed 
zero-order kinetics (dashed line) until the number of microtubules 
in the solution began to decrease. Adapted with permission from 
ref 505. Copyright 1977 Academic Press. 


d[ polymer 
nn = k_,p+[+end] + k_„n-[-end] 
(13-52) 


As long as [end] remains constant, the reaction remains 
zero-order. As [end] begins to decrease, when more and 
more microtubules cease to exist, the rate of depolymer- 
ization decreases.°” From electron micrographs of sam- 
ples removed at various times from a population of 
depolymerizing microtubules, the decrease in the 
observed rate constant of depolymerization at the longer 
times could be quantitatively correlated to the decrease 
in the number concentration of microtubules and hence 
the molar concentration of ends.°”? 

There are a number of observations that support this 
description of the polymerization of tubulin. 
Microtubules assembled from tubulin to which guanylyl 
5’-(B, ymethylenediphosphonate), a nonhydrolyzable 
analogue of GTP, has been bound, rather than GTP itself, 
exhibit only elongation and no rapid catastrophic depoly- 


merization.” If the concentration of GTP in the solution 
is suddenly dropped so that GTP dissociates from the pro- 
tomeric tubulin at the end of a microtubule, the rate at 
which ends switch to catastrophic depolymerization 
increases 50-fold. When microtubules are elongating ina 
mixture of GTP-tubulin and GDP-tubulin, and the con- 
centration of GDP in the solution is increased 100-fold, 
so that the GTP bound to protomers at the ends of elon- 
gating tubules exchanges for GDP, the rate at which ends 
switch to catastrophic depolymerization increases 10- 
fold.**' When a uniform population of microtubules elon- 
gating rapidly at high concentrations of monomeric 
GTP-tubulin is diluted to a concentration slightly below 
the apparent critical concentration, the bulk concentra- 
tion of microtubules, [microtubule], immediately begins 
to decrease. This decrease, however, is entirely due to a 
decrease in the number concentration of microtubules 
rather than the mean length of the remaining micro- 
tubules. The microtubules that remain are still slowly 
elongating." Those that have disappeared have lost the 
race with the advancing boundary of protomeric 
GDP-tubulin at one of their ends. When microtubules 
grown from seeds reach a steady-state bulk concentra- 
tion, this concentration is maintained by a decrease in the 
number of microtubules and an elongation of those that 
remain.’'* Those that are still in the race are staying ahead 
at the expense of the losers. 

Colchicine and podophyllotoxin are inhibitors of 
tubulin elongation. They bind to free af heterodimers of 
tubulin, which then have a higher affinity for an elongat- 
ing end of a microtubule. Once a few of the toxin-tubulin 
complexes have entered an elongating end, however, it is 
no longer able either to elongate further or to depoly- 
merize catastrophically when the wave of GDP-tubulin 
reaches it,” because it is capped by those complexes 
of tubulin and colchicine or podophyllotoxin. When 
podophyllotoxin is added to microtubules grown to 
steady state from seeds, the bulk concentration of micro- 
tubule, [microtubule],, decreases in two phases. One 
phase has the rate constant (k_„n+ + kK»p_) of normal 
depolymerization, and the other phase is much slower. 
The rapid phase is the depolymerization from the ends 
that lose the race with the boundary of GDP-tubulin and 
begin to depolymerize before they can be capped by 
complexes of tubulin and podophyllotoxin. The slow 
phase is the slow depolymerization of microtubules that 
have become capped at both ends by podophyllotoxin 
before either end can begin to depolymerize. If fresh 
tubulin is added with the podophyllotoxin, the magni- 
tude of the rapid phase is decreased as expected. 

The elongation of a microtubule is a race between 
addition of monomeric GTP-tubulin at the end and the 
spread of the boundary of protomeric GDP-tubulin along 
the body of the microtubule (Figure 13-28). The race is 
lost when the GDP-tubulin reaches an end that cannot 
outdistance it, and the penalty for losing is catastrophic 
depolymerization. The observed rate constant for 
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hydrolysis of GTP within an elongating microtubule is 
slow (0.06 s'), so there is little chance that it will switch 
from elongation to catastrophic depolymerization while 
it is elongating rapidly; and when it has reached a goal, 
the structure with which it associates at that goal caps its 
end. 

The purpose of this elaborate device seems to be the 
elimination of microtubules that have failed to find a goal 
and the maintenance of the origin of the network of micro- 
tubules in the cell.°’* In a microtubule that has not been 
able to find a goal and have its elongating end capped in 
recognition of its success, the boundary of GDP-tubulin 
will eventually catch up and the uncapped microtubule 
that has failed in its search will catastrophically depoly- 
merize. Also, when a microtubule breaks into two pieces 
for any reason, the break will almost always occur in the 
region where protomeric GDP-tubulin is located. The 
broken tubule will catastrophically depolymerize from its 
broken ends, and the fragment attached to the centrosome 
and the other fragment will disappear. This prevents 
broken pieces from initiating microtubules unattached to 
centrosomes. The centrosome has a finite capacity to ini- 
tiate microtubules, but as sites become empty after the 
catastrophic depolymerizations of the failures, new micro- 
tubules, the elongating ends of which are again outracing 
their destruction and which are in their turn searching for 
success, are initiated at those empty sites. As the centro- 
some can initiate microtubules at concentrations lower 
than those at which they initiate spontaneously, almost all 
microtubules end up originating at the centrosome. 

The schematic drawings of Figure 13-24B and Figure 
13-28 are somewhat misleading. The actual structure of the 
elongating end is a flattened sheet of protofilaments that 
has not yet been rolled up and joined at the seam of its 
microtubule. The rolling up of the sheet and zipping of the 
microtubule along the seam lags behind the elongation of 
the protofilaments forming the unrolled, flattened sheet at 
the very end." 

The assembly of actin into thin filaments (Figure 
9-1B) is similar to that of tubulin into microtubules. Thin 
filaments of actin are polar. As with a microtubule, a thin 
filament elongates at both ends, but it elongates at one 
end about 7 times more rapidly than at the other "77 
The end that elongates more rapidly is the plus end or the 
barbed end of the thin filament when it is decorated with 
fragments of myosin. This end is the anchored end of a 
thin filament within the cell, toward which a thick fila- 
ment of myosin usually slides. 

The elongation of actin has all of the characteristic 
features of the elongation of tubulin, albeit with distinct 
rate constants, but the nucleotide that controls the poly- 
merization is ATP rather than GTP.” The ATP binds to 
the actin, the ATP-actin complex is incorporated into the 
thin filament, and the ATP is then hydrolyzed to ADP 
slowly enough that the hydrolysis of the ATP significantly 
lags behind the elongation of the filament.***”” The elon- 
gation of actin filaments displays a critical concentration 
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for monomeric actin.“°°”’ The ADP-actin complex is 
unable to elongate actin filaments at concentrations at 
which ATP-actin can,’ because the critical concentra- 
tion of ADP-actin is much higher than that of 
ATP-actin.™®®0> Actin has been crystallized with ATP 
bound and with ADP bound, and the two crystallographic 
molecular models have significantly different conforma- 
tions, an observation that explains their different critical 
concentrations.’ The rate of depolymerization of fila- 
ments of actin with ADP-actin at their ends is about 10- 
fold greater than the rate of depolymerization of filaments 
of actin with ATP-actin at their ends,“°**!“* so cata- 
strophic depolymerization occurs upon exposure of an 
end uncapped by ATP-actin. 

The nucleation of the polymerization of actin fila- 
ments in the cell is accomplished by particular proteins 
or complexes of proteins. One complex of proteins 
responsible for nucleation of actin filaments is the 
Arp213 complex, which contains seven different sub- 
units, each present in a single copy, two of which are 
homologues of actin and may serve as the actual sites of 
nucleation.” There are also a number of individual, 
monomeric proteins that are able to initiate the poly- 
merization of actin, such as villin, 47°"> >” fragmin, gel- 
solin,°°° and F-actin capping protein.” They do so by 
forming complexes with two or three, one, two, or two or 
three molecules of actin, respectively, and it is these 
complexes that nucleate polymerization. Consequently, 
the nuclei required to initiate polymerization of actin are 
much simpler than those required to initiate polymeriza- 
tion of tubulin. Purified preparations of actin, like puri- 
fied preparations of tubulin, can spontaneously nucleate 
polymerization when nucleotide is added, but at least 
one of the contaminating proteins responsible for this 
self-nucleation is a covalent dimer of actin produced 
adventitiously during the purification.” 

Reversibly assembled helical polymers such as 
microtubules and filaments of actin interact with an elab- 
orate set of stabilizing and destabilizing proteins that con- 
trol, sculpt, and employ the basic polymers according to 
the needs of the cell. In humans, there are more than 15 
microtubule-associated proteins that have been identi- 
fied by their ability to copolymerize with tubulin. The 
kinetochores of chromosomes bind to the elongating ends 
of microtubules in such a way that the ends can still elon- 
gate and then depolymerize, in the process pushing and 
then pulling the kinetochores and their attached chro- 
mosomes away from and then toward the centrosome.””” 
In this process, it is the significant, favorable free energy 
of elongation resulting from the fact that the concentra- 
tion of monomeric GTP-tubulin is well above its critical 
concentration that provides the free energy to drag the 
kinetochore along with the elongating end away from the 
centrosome, and itis the much more favorable free energy 
of depolymerization resulting from the fact that the con- 
centration of monomeric GDP-tubulin is even more dis- 
tant from its critical concentration that provides the free 


energy to pull the kinetochore even more vigorously 
toward the centrosome. Dynein is a protein that slides 
along the outer surfaces of microtubules toward the cen- 
trosome while hydrolyzing MgATP, carrying structures to 
which it in turn is attached in that direction.” "77 Kinesins 
slide along microtubules also while hydrolyzing 
MgATP.>® 

Thin filaments ofactin are often sculpted to precisely 
regulated lengths and shapes for certain purposes. There 
are a number of proteins responsible for this func- 
tion.” ®®! For example, F-actin capping protein isa widely 
distributed off heterodimer that binds to elongating 
barbed ends of actin filaments and stably caps them by 
preventing them from either elongating further or cata- 
strophically depolymerizing.”'”” >% Its wide distribution 
suggests that itis the major capping protein in animal cells. 
F-Actin capping protein is located in the Z line of skeletal 
muscle,” a structure in which are embedded the barbed 
plus ends of the actin filaments that are organized in reg- 
ular arrays of precise length found in this tissue. The length 
of these actin filaments in skeletal muscle is defined by the 
length ofa single molecule of the long fibrous protein neb- 
ulin, which acts as a molecular ruler”°” that binds 
tightly along the length of the actin filaments.” It is the 
Z line upon which the actin filament pulls as the thick 
filament of myosin slides along it in that direction. Capped 
plus ends of actin filaments are also embedded in struc- 
tures at the cellular membrane organized around the pro- 
tein vinculin.°”°” The minus ends of the actin filaments 
in the regular arrays in cardiac muscle are capped by tropo- 
modulin.?”* 

Thick filaments of myosin are the oligomeric pro- 
teins that slide along thin filaments of actin and pull 
upon them while hydrolyzing MgATP. Thick filaments 
are helical polymeric proteins noncovalently assembled 
from a monomer known as myosin. Myosin is composed 
from two identical œ polypeptides (naa = 1940) and sev- 
eral shorter polypeptides (n,a = 150-200). The carboxy- 
terminal 1100 aa of the two a polypeptides are entwined 
around each other to form a two-stranded, a-helical 
coiled coil (Figure 6-29) 150 nm in length?” that has two 
globular, detachable domains known as heads, each of 
which is formed from the amino-terminal 800 aa of one 
of the apolypeptides, at one of its ends (Figure 
13-30A).°° 7? The shorter polypeptides are incorporated 
into these globular heads. 

The individual coiled coils of the myosin molecules 
are segments of rope that are assembled into a helical 
cable from which the myosin heads protrude (Figure 
13-30B).°”’ The segments of rope add to the elongating 
cable at each of its two ends with opposite orientation. In 
each direction along the cable, the molecules of myosin 
add so that the empty carboxy-terminal ends of their seg- 
ments of rope point toward the middle of the cable and 
the amino-terminal ends of the segments of rope to 
which the myosin heads are attached point away from 
the middle (Figure 13-30C).°” The absence of myosin 
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Figure 13-30: Structures of myosin and thick filaments. (A) Gallery of electron micrographs of myosin molecules 7" Myosin from thick fila- 
ments in rabbit skeletal muscle, which had been disassembled at high ionic strength (0.5 M KCI), was purified by precipitation at low ionic 
strength, ammonium sulfate precipitation, and anion-exchange chromatography. A solution of purified myosin at 50 ug mL” in 0.6 M ammo- 
nium formate was sprayed onto the surface of freshly cleaved mica, and the water and the volatile salt of ammonium formate were evapo- 
rated from the surface. The adsorbed molecules of myosin were coated with platinum as the mica was rotated in a beam of platinum vapor. 
The film of platinum was removed from the mica, transferred to a grid, and viewed in an electron microscope. Magnification 120000x. 
Reprinted with permission from ref 575. Copyright 1978 Academic Press. (B) Thick filaments from muscle of Placopectin magellanicus.” 
Strips of muscle were chopped finely and homogenized in a solution containing MgATP to dissociate thick and thin filaments. A drop of this 
homogenate was placed on a carbon film and negatively stained with 3% uranyl acetate. Thick filaments were located in the specimen in the 
electron microscope and photographs were taken. The white bars indicate the bare zones on the thick filaments. Magnification 60000x. 
Adapted with permission from ref 577. Copyright 1983 Academic Press. (C) Diagrammatic representation of the way in which molecules of 
myosin are assembled to form a thick filament.” Each continuous set of line segments is an individual molecule of myosin; the two globu- 
lar heads (panel A) are represented by a W and the tail by a line. All of the tails point toward the center of the filament, and at the center the 
orientation reverses. Because the pairs of heads are directed distally, the bare zone (the smooth central portion of each of the thick filaments 
indicated by white bars in panel B) has no heads protruding from it. The structure assembled in this way has a 2-fold rotational axis of pseu- 
dosymmetry at its center. Reprinted with permission from ref 578. Copyright 1969 American Association for the Advancement of Science. 
(D) Helical surface lattice of globular myosin heads upon a thick filament.” The optical density of electron micrographs, such as those in 
panel B, was digitized, and the Fourier transform of the digitized optical density was calculated. Discrete reflections arising from longitudi- 
nal spacings of 14.5 and 29.0 nm along the thick filament and helical spacings of 48.0 nm (panel B) were observed in the Fourier transform. 
Reflections on the reciprocal helical lattice in the Fourier transform were selected and used to calculate a three-dimensional distribution of 
electron scattering density. The image presented is of the front four helical strands of the seven-start right-handed helical lattice on one thick 
filament. This helical pattern produces the strong reflections of a 48.0 nm helical repeat. The globular heads are arranged in circular disks 
14.5 nm in height and seven heads in circumference that are stacked helically to produce the lattice. The lines drawn in the figure indicate 
the probable orientation of two thin filaments of actin relative to the surface lattice of the thick filament myosin. Reprinted with permission 
from ref 577. Copyright 1983 Academic Press. 
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heads where the segments of rope pointing in opposite 
directions overlap in the middle of the cable creates a 
bare zone 150 nm in length.” Because of this pattern of 
assembly, thick filaments (Figure 13-30B),°” unlike thin 
filaments, which have two distinct ends, have two ends 
that are identical but of opposite orientation. This neces- 
sarily produces a 2-fold rotational axis of pseudosymme- 
try normal to the axis of the thick filament in the center 
of the bare zone (Figure 13-30C). 

Upon the surface of the thick filament distal to the 
bare zone, the myosin heads are arranged in a helical sur- 
face lattice, reflecting the underlying helical symmetry of 
the cable (Figure 13-30D). This helical surface lattice is 
right-handed and septuply threaded, and myosin heads 
protrude from each of the seven constituent helices at 
intervals that are vertically in register’”’ to create hori- 
zontal rings, or crowns, each spaced at 14.4-nm inter- 
vals.” The seven globular protrusions” around each 
crown are each single myosin heads because each crown 
accounts for the total molecular mass of about 3.5 mole- 
cules of myosin.°°°> Because seven is an odd number, 
no two adjacent globular heads in the same crown can be 
from the same molecule of myosin. The pair of heads 
from the same molecule of myosin must be consecutive 
to each other within the same helix. Therefore, along each 
of the seven threads of the helical lattice, the heads must 
alternate in the pattern lower head of myosin i, upper 
head of myosin i, lower head of myosin i+1, upper head 
of myosin i+1, and so forth. If upper heads are in register 
across the seven helices and lower heads are in register, 
crowns of lower heads and crowns of upper heads would 
alternate along the thick filament. Such an alternating 
pattern has been observed.” 

Thick filaments of myosin assemble spontaneously 
from monomers of myosin’” but do not become helical 
polymeric proteins of indefinite length such as fibrin, 
tubulin, or actin. Their final length is between 1000 and 
3000 nm when they are polymerized under experimental 
situations” or between 1600 and 2000 nm when they are 
polymerized within a contractile tissue such as skeletal 
muscle (Figure 13-30B).°””°** 

Myosin from Acanthamoeba castellanii sponta- 
neously assembles into minifilaments that are bipolar 
thick filaments composed of only eight myosin 
monomers. During the assembly of these minifilaments, 
monomers first form antiparallel dimers, antiparallel 
dimers form antiparallel tetramers, and tetramers then 
form octamers.”” The formation of such an antiparallel 
dimer may also be the initial step in the formation of the 
larger types of thick filaments.*°°°*” 

When monomeric actin is induced to polymerize in 
the presence of thick filaments of myosin, thin filaments 
of actin form around each thick filament of myosin.’® 
The thin filaments are positioned around the thick fila- 
ments at seven evenly spaced intervals. Presumably, the 
assembly of this 7-fold array is dictated by the underlying 
seven helices of the myosin heads. The pitch of the seven 


thin filaments of this actin, however, is much steeper 
than that of the seven primary helices on the surface of 
the thick filament, the assembled thin filaments of actin 
are almost parallel to the axis of the thick filament, and 
the pitch of the septuply threaded helical array of actin 
filaments is left-handed instead of right-handed. This 
alignment could be explained if the thin filaments of 
actin were in contact only with every other crown or 
every fourth crown along the thick filament and stepped 
up one helix for each contact (Figure 13-30D). 

The measured rise for each subunit in a thin fila- 
ment of actin is 2.8 nm and the measured rotation for 
each subunit is 166°. For a thin filament to span two 
crowns and step up one helix (Figure 13-30D) would 
require 29.7 nm, the distance covered by 11 subunits of 
actin if the rise for each subunit in a thin filament were 
actually 2.7 nm rather than 2.8 nm. The eleventh pro- 
tomer further along a thin filament would be pointed in 
exactly the same direction toward the thick filament as a 
subunit of actin already attached to a myosin head in that 
thick filament if the rotation for each subunit of actin in 
a thin filament were 164° instead of 166°. A similarly suc- 
cessful but not so remarkable fit of the dimensions can 
be made if the thin filament of actin makes contact only 
with every fourth crown and steps up one helix. The 
coincidences between the dimensions of the thin fila- 
ment and the thick filament are reminders that the more 
primordial of the two served as the template for the evo- 
lution of the other. 

With the assembly of helical polymeric proteins 
such as microtubules, thick filaments of myosin, and 
sculpted thin filaments of actin and the assembly of 
oligomeric proteins such as ribosomes, protein chem- 
istry enters the microscopic realm and becomes cell biol- 
ogy. Another set of striking microscopic cellular features 
that are of importance in cell biology are membranes. 


Suggested Reading 


Mitchison, T., & Kirschner, M. (1984) Microtubule assembly nucle- 
ated by isolated centrosomes and dynamic instability of micro- 
tubule growth, Nature 312, 232-242. 


Problem 13-8: Make a xerographic copy of Figure 
13-24B. Cut out the surface lattice and roll the paper into 
a cylinder. Follow the various helices over the cylinder 
and identify how many individual strands each of them 
has. 


Problem 13-9: The observed rate constants for micro- 
tubule assembly listed in this table were obtained from 
the data in Figure 13-27. 


rate constant value rate constant value 
Kure 3.8 x 10° Mts? Konti 0.4 s7! 
kai: 1.2 x 10 Miel koni- lis? 
Rape 340 el Konn- 210 5" 


(A) 


(B) 


(C) 


Describe which rate constant was derived from 
which aspect of this figure. Which rate constants 
are in error and why? 


If the concentration of monomeric tubulin were 
10 uM, the steady-state apparent critical concen- 
tration, at what rate (second) would a tubule 
capped with GTP-tubulin be elongating at its 
minus end? 


If the plus end of this tubule were capped with 
GDP-tubulin, at what rate (second”) would it be 
depolymerizing? 


References 


1. 


2. 


22. 


23. 


Dill, K.A., & Shortle, D. (1991) Annu. Rev. Biochem. 60, 
795-825. 

Weinreb, P.H., Zhen, W., Poon, A.W., Conway, K.A., & 
Lansbury, P.T., Jr. (1996) Biochemistry 35, 
13709-13715. 


. Uversky, V.N., Gillespie, J.R., & Fink, A.L. (2000) 


Proteins: Struct., Funct., Genet. 41, 415-427. 


. Rariy, R.V., & Klibanov, A.M. (1997) Proc. Natl. Acad. 


Sci. U.S.A. 94, 13520-13523. 


. Tanford, C. (1968) Protein denaturation, Adv. Protein 


Chem. 23, 121-282. 


. Edelhoch, H. (1967) Biochemistry 6, 1948-1954. 
. Nozaki, Y., & Tanford, C. (1963) J. Biol. Chem. 238, 


4074-4081. 


. Nozaki, Y., & Tanford, C. (1970) J. Biol. Chem. 245, 


1648-1652. 


. Wetlaufer, D.B., Malik, S., Stoller, L., & Coffin, R.L. 


(1964) J. Am. Chem. Soc. 86, 508-514. 


. Nandi, P.K., & Robinson, D.R. (1984) Biochemistry 23, 


6661-6668. 


. Creighton, T.E. (1979) J. Mol. Biol. 129, 235-264. 
. Parker, M.J., Spencer, J., & Clarke, A.R. (1995) J. Mol. 


Biol. 253, 771-786. 


. Makhatadze, G.I., & Privalov, P.L. (1992) J. Mol. Biol. 


226, 491-505. 


. Lee, J.C., & Timasheff, S.N. (1974) Biochemistry 13, 


257-265. 


. Courtenay, E.S., Capp, M.W., & Record, M.T., Jr. (2001) 


Protein Sci. 10, 2485-2497. 


. Timasheff, S.N., & Xie, G. (2003) Biophys. Chem. 105, 


421-448. 


. Breslow, R., & Guo, T. (1990) Proc. Natl. Acad. Sci. 


U.S.A. 87, 167-169. 


. Roseman, M., & Jencks, W.P. (1975) J. Am. Chem. Soc. 


97, 631-640. 


. Herskovits, T.T., Jaillet, H., & Gadegbeku, B. (1970) J. 


Biol. Chem. 245, 4544-4550. 


. Pace, C.N., & Marshall, H.F., Jr. (1980) Arch. Biochem. 


Biophys. 199, 270-276. 


. Steiner, D.F., & Clark, J.L. (1968) Proc. Natl. Acad. Sci. 


U.S.A. 60, 622-629. 

Roxby, R., & Tanford, C. (1971) Biochemistry 10, 
3348-3352. 

Yao, M., & Bolen, D.W. (1995) Biochemistry 34, 
3771-3781. 


24. 


25. 


26. 


21: 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


33. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


51. 


52. 


References 733 


Bolen, D.W., & Santoro, M.M. (1988) Biochemistry 27, 
8069-8074. 

Pace, C.N., Laurents, D.V., & Thomson, J.A. (1990) 
Biochemistry 29, 2564-2572. 

Fitch, C.A., Karp, D.A., Lee, K.K., Stites, W.E., Lattman, 
E.E., & Garcia-Moreno, E.B. (2002) Biophys. J. 82, 
3289-3304. 

Langsetmo, K., Fuchs, J.A., & Woodward, C. (1991) 
Biochemistry 30, 7603-7609. 

Inoue, M., Yamada, H., Hashimoto, Y., Yasukochi, T., 
Hamaguchi, K., Miki, T., Horiuchi, T., & Imoto, T. 
(1992) Biochemistry 31, 8816-8821. 

Anderson, D.E., Becktel, W.J., & Dahlquist, F.W. (1990) 
Biochemistry 29, 2403-2408. 

Giletto, A., & Pace, C.N. (1999) Biochemistry 38, 
13379-13384. 

Oliveberg, M., Arcus, V.L., & Fersht, A.R. (1995) 
Biochemistry 34, 9424-9433. 

Stites, W.E., Gittis, A.G., Lattman, E.E., & Shortle, D. 
(1991) J. Mol. Biol. 221, 7-14. 

Hermans, J., Jr., & Acampora, G. (1967) J. Am. Chem. 
Soc. 89, 1547-1552. 

Robertson, A.D., & Baldwin, R.L. (1991) Biochemistry 
30, 9907-9914. 

Arcus, V.L., Vuilleumier, S., Freund, S.M., Bycroft, M., 
& Fersht, A.R. (1995) J. Mol. Biol. 254, 305-321. 

Liu, Z.P., Rizo, J., & Gierasch, L.M. (1994) Biochemistry 
33, 134-142. 

Burton, S.J., Quirk, A.V., & Wood, P.C. (1989) Eur. J. 
Biochem. 179, 379-387. 

Mainfroid, V., Terpstra, P., Beauregard, M., Frere, J.M., 
Mande, S.C., Hol, W.G., Martial, J.A., & Goraj, K. (1996) 
J. Mol. Biol. 257, 441-456. 

Carra, J.H., & Privalov, P.L. (1997) Biochemistry 36, 
526-535. 

Edge, V., Allewell, N.M., & Sturtevant, J.M. (1985) 
Biochemistry 24, 5899-5906. 

Manly, S.P., Matthews, K.S., & Sturtevant, J.M. (1985) 
Biochemistry 24, 3842-3846. 

Brandts, J.F., Hu, C.Q., Lin, L.N., & Mos, M.T. (1989) 
Biochemistry 28, 8588-8596. 

Alber, T., Sun, D.P., Wilson, K., Wozniak, J.A., Cook, 
S.P., & Matthews, B.W. (1987) Nature 330, 41-46. 
Novokhatny, V.V., Kudinov, S.A., & Privalov, P.L. (1984) 
J. Mol. Biol. 179, 215-232. 

Flanagan, M.T., & Hesketh, T.R. (1974) Eur. J. Biochem. 
44, 251-259. 

Yang, M., Liu, D., & Bolen, D.W. (1999) Biochemistry 38, 
11216-11222. 

Baskakov, I.V., & Bolen, D.W. (1998) Biochemistry 37, 
18010-18017. 

Benz, F.W., & Roberts, G.C. (1975) J. Mol. Biol. 91, 
367-387. 

Hoeltzli, S.D., & Frieden, C. (1994) Biochemistry 33, 
5502-5509. 

Perl, D., Welker, C., Schindler, T., Schroder, K., 
Marahiel, M.A., Jaenicke, R., & Schmid, F.X. (1998) Nat. 
Struct. Biol. 5, 229-235. 

Steif, C., Weber, P., Hinz, H.J., Flossdorf, J., Cesareni, 
G., & Kokkinidis, M. (1993) Biochemistry 32, 3867-3876. 
Lumry, R., Biltonen, R., & Brandts, J.F. (1966) 
Biopolymers 4, 917. 


734 


53 


54. 


55. 


56. 


57. 


58. 


59. 


60. 


61. 


62. 


63. 


64. 


65. 


66. 


67. 


68. 


69. 


70. 


71. 


72. 


73. 


74. 


75. 


76. 


77. 


78. 


79. 


80. 


81. 


82. 


Folding and Assembly 


. Brandts, J.F., & Hunt, L. (1967) J. Am. Chem. Soc. 89, 
4826-4838. 

Aune, K.C., & Tanford, C. (1969) Biochemistry 8, 
4579-4585. 

Plaza del Pino, I.M., Pace, C.N., & Freire, E. (1992) 
Biochemistry 31, 11196-11202. 

Yu, Y., Makhatadze, G.I., Pace, C.N., & Privalov, P.L. 
(1994) Biochemistry 33, 3312-3319. 

Shen, L.L., & Hermans, J., Jr. (1972) Biochemistry 11, 
1836-1841. 

Ginsburg, A., & Carroll, W.R. (1965) Biochemistry 4, 
2159-2174. 

Jasanoff, A., Davis, B., 
Biochemistry 33, 6350-6355. 
Rudolph, R., Siebendritt, R., Nesslauer, G., Sharma, 
A.K., & Jaenicke, R. (1990) Proc. Natl. Acad. Sci. U.S.A. 
87, 4625-4629. 

Frech, C., Wunderlich, M., Glockshuber, R., & Schmid, 
F.X. (1996) Biochemistry 35, 11386-11395. 

Gualfetti, P.J., Bilsel, O., & Matthews, C.R. (1999) 
Protein Sci. 8, 1623-1635. 

Herold, M., & Kirschner, K. (1990) Biochemistry 29, 
1907-1913. 

Brazhnikov, E.V., Chirgadze, Y., Dolgikh, D.A., & 
Ptitsyn, O.B. (1985) Biopolymers 24, 1899-1907. 
Hughson, F.M., Wright, P.E., & Baldwin, R.L. (1990) 
Science 249, 1544-1548. 

Kuwajima, K., Nitta, K., Yoneyama, M., & Sugai, S. 
(1976) J. Mol. Biol. 106, 359-373. 

Wong, K.P., & Tanford, C. (1973) J. Biol. Chem. 248, 
8518-8523. 

Chaudhuri, T.K., Arai, M., Terada, T.P., Ikura, T., & 
Kuwajima, K. (2000) Biochemistry 39, 15643-15651. 
Sasahara, K., Demura, M., & Nitta, K. (2000) 
Biochemistry 39, 6475-6482. 

Timm, D.E., de Haseth, P.L., & Neet, KE (1994) 
Biochemistry 33, 4667-4676. 

Apiyo, D., Jones, K., Guidry, J., & Wittung-Stafshede, P. 
(2001) Biochemistry 40, 4940-4948. 

Zhuang, P., Eisenstein, E., & Howell, E.E. (1994) 
Biochemistry 33, 4237-4244. 

Grimsley, J.K., Scholtz, J.M., Pace, C.N., & Wild, J.R. 
(1997) Biochemistry 36, 14366-14374. 

Predki, P.F., & Regan, L. (1995) Biochemistry 34, 
9834-9839. 

Liang, H., Sandberg, W.S., & Terwilliger, T.C. (1993) 
Proc. Natl. Acad. Sci. U.S.A. 90, 7010-7014. 

Silinski, P., Allingham, M.J., & Fitzgerald, M.C. (2001) 
Biochemistry 40, 4493-4502. 

Xie, D., Gulnik, S., & Erickson, J.W. (2000) J. Am. Chem. 
Soc. 122, 11533-11534. 

Pace, N.C., & Tanford, C. (1968) Biochemistry 7, 
198-208. 

Johnson, C.M., Oliveberg, M., Clarke, J., & Fersht, A.R. 
(1997) J. Mol. Biol. 268, 198-208. 

Milne, J.S., Xu, Y., Mayne, L.C., & Englander, S.W. 
(1999) J. Mol. Biol. 290, 811-822. 

Privalov, P.L., Tiktopulo, E.I., Venyaminov, S., Griko Yu, 
V., Makhatadze, G.I., & Khechinashvili, N.N. (1989) J. 
Mol. Biol. 205, 737-750. 

Pace, C.N., & Laurents, D.V. (1989) Biochemistry 28, 
2520-2525. 


& Fersht, AR (1994) 


83. 


84. 


100. 
101. 
102. 
103. 
104. 
105. 


106. 
107. 


108. 


109. 


110. 


111. 


112. 


113. 


114. 


115. 


. Brandts, J.F., Oliveira, R.J., & Westort, C. 


Sturtevant, J.M. (1977) Proc. Natl. Acad. Sci. U.S.A. 74, 
2236-2240. 

Doig, A.J., & Williams, D.H. (1991) J. Mol. Biol. 217, 
389-398. 


. Privalov, P.L., & Makhatadze, G.I. (1990) J. Mol. Biol. 


213, 385-391. 


. Johnson, C.M., & Fersht, A.R. (1995) Biochemistry 34, 


6795-6804. 


. Myers, J.K., Pace, C.N., & Scholtz, J.M. (1995) Protein 


Sci. 4, 2138-2148. 


. Baldwin, R.L. (1986) Proc. Natl. Acad. Sci. U.S.A. 83, 


8069-8072. 


. Privalov, P.L., & Makhatadze, G.I. (1992) J. Mol. Biol. 


224, 715-723. 


. Hackel, M., Hinz, H.J., & Hedwig, G.R. (1999) J. Mol. 


Biol. 291, 197-213. 


. Makhatadze, G.I., & Privalov, P.L. (1990) J. Mol. Biol. 


213, 375-384. 


. Clark, N.S., Dodd, I., Mossakowska, D.E., Smith, R.A., & 


Gore, M.G. (1996) Protein Eng. 9, 877-884. 


. Brandts, J.F. (1964) J. Am. Chem. Soc. 86, 4302-4314. 
. Chen, B.L., & Schellman, J.A. (1989) Biochemistry 28, 


685-691. 


. Wong, K.B., Freund, S.M., & Fersht, A.R. (1996) J. Mol. 


Biol. 259, 805-818. 


. Zhang, J., Peng, X., Jonas, A. & Jonas, J. (1995) 


Biochemistry 34, 8631-8641. 
(1970) 
Biochemistry 9, 1038-1047. 


. Hawley, S.A. (1971) Biochemistry 10, 2436-2442. 
. Zipp, A., & Kauzmann, W. (1973) Biochemistry 12, 


4217-4228. 

Gavish, B., Gratton, E., & Hardy, C.J. (1983) Proc. Natl. 
Acad. Sci. U.S.A. 80, 750-754. 

Paladini, A.A., Jr., & Weber, G. (1981) Biochemistry 20, 
2587-2593. 

Kellis, J.T., Jr., Nyberg, K., & Fersht, AR (1989) 
Biochemistry 28, 4914-4922. 

Pace, C.N. (1975) CRC Crit. Rev. Biochem. 3, 1-43. 
Tanford, C. (1970) Adv. Protein Chem. 24, 1-95. 
Santoro, M.M., & Bolen, D.W. (1988) Biochemistry 27, 
8063-8068. 

Puett, D. (1973) J. Biol. Chem. 248, 4623-4634. 
Greene, R.F., Jr., & Pace, C.N. (1974) J. Biol. Chem. 249, 
5388-5393. 

Staniforth, R.A., Burston, S.G., Smith, C.J., Jackson, 
G.S., Badcoe, I.G., Atkinson, T., Holbrook, J.J., & Clarke, 
A.R. (1993) Biochemistry 32, 3842-3851. 

Schindler, T., Herrler, M., Marahiel, M.A., & Schmid, 
F.X. (1995) Nat. Struct. Biol. 2, 663-673. 

Myers, J.K., & Oas, T.G. (2001) Nat. Struct. Biol. 8, 
552-558. 

Santoro, M.M., & Bolen, D.W. (1992) Biochemistry 31, 
4901-4907. 

Jamin, M., & Baldwin, R.L. (1996) Nat. Struct. Biol. 3, 
613-618. 

Horng, J.C., Cho, J.H., & Raleigh, D.P. (2005) J. Mol. Biol. 
345, 163-173. 

McNutt, M., Mullins, L.S., Raushel, F.M., & Pace, C.N. 
(1990) Biochemistry 29, 7572-7576. 

Goto, Y., & Hamaguchi, K. (1982) J. Mol. Biol. 156, 
891-910. 


116. 


117. 


118. 


119. 


120. 


121. 


122. 


123. 


124. 


125. 


126. 


127. 


128. 


129. 


130. 


131. 


132. 


133. 


134. 
135. 


136. 


137. 


138. 


145. 


146. 


Bai, Y., Sosnick, T.R., Mayne, L., & Englander, S.W. 
(1995) Science 269, 192-197. 

Chamberlain, A.K., Handel, T.M., & Marqusee, S. 
(1996) Nat. Struct. Biol. 3, 782-787. 
Huyghues-Despointes, B.M., Scholtz, J.M., & Pace, 
C.N. (1999) Nat. Struct. Biol. 6, 910-912. 

Scholtz, J.M., Barrick, D., York, E.J., Stewart, J.M., & 
Baldwin, R.L. (1995) Proc. Natl. Acad. Sci. U.S.A. 92, 
185-189. 

Ibarra-Molero, B., & Sanchez-Ruiz, 
Biochemistry 35, 14689-14702. 
Baldwin, E., Xu, J., Hajiseyedjavadi, O., Baase, W.A., & 
Matthews, B.W. (1996) J. Mol. Biol. 259, 542-559. 
Mendel, D., Ellman, J.A., Chang, Z., Veenstra, D.L., 
Kollman, P.A., & Schultz, P.G. (1992) Science 256, 
1798-1802. 

Wynn, R., & Richards, F.M. (1993) Protein Sci. 2, 
395-403. 

Salahuddin, A., & Tanford, C. (1970) Biochemistry 9, 
1342-1347. 

Aune, K.C., & Tanford, C. (1969) Biochemistry 8, 
4586-4590. 

Pace, C.N., & Vanderburg, K.E. (1979) Biochemistry 18, 
288-292. 

Knapp, J.A., & Pace, C.N. (1974) Biochemistry 13, 
1289-1294. 

Neira, J.L., Itzhaki, L.S., Otzen, D.E., Davis, B., & Fersht, 
A.R. (1997) J. Mol. Biol. 270, 99-110. 

Akasako, A., Haruki, M., Oobatake, M., & Kanaya, S. 
(1995) Biochemistry 34, 8115-8122. 

Van den Burg, B., Vriend, G., Veltman, O.R., Venema, 
G., & Eijsink, V.G. (1998) Proc. Natl. Acad. Sci. U.S.A. 95, 
2056-2060. 

Rehage, A., & Schmid, F.X. (1982) Biochemistry 21, 
1499-1505. 

Llinas, M., Gillespie, B., Dahlquist, F.W., & Marqusee, 
S. (1999) Nat. Struct. Biol. 6, 1072-1078. 

Linse, S., Teleman, O., & Drakenberg, T. 
Biochemistry 29, 5925-5934. 

Wagner, G. (1983) Q. Rev. Biophys. 16, 1-57. 
Wang, Q.W., Kline, A.D., & Wuthrich, K. (1987) 
Biochemistry 26, 6488-6493. 

Wand, AJ., Roder, H., & Englander, S.W. (1986) 
Biochemistry 25, 1107-1114. 

Goedken, E.R., & Marqusee, S. (2001) J. Mol. Biol. 314, 
863-871. 

Chamberlain, A.K., & Marqusee, S. (1998) Biochemistry 
37, 1736-1742. 


J.M. (1996) 


(1990) 


. Laurents, D.V., Scholtz, J.M., Rico, M., Pace, C.N., & 


Bruix, M. (2005) Biochemistry 44, 7644-7655. 


. Heinz, D.W., Baase, W.A., & Matthews, B.W. (1992) 


Proc. Natl. Acad. Sci. U.S.A. 89, 3751-3755. 


. Lin, M.C. (1970) J. Biol. Chem. 245, 6726-6731. 
. Taniuchi, H., & Anfinsen, C.B. (1969) J. Biol. Chem. 244, 


3864-3875. 


. Sachs, D.H., Schecter, A.N., Eastlake, A., & Anfinsen, 


C.B. (1974) Nature 251, 242-244. 


. Alexandrescu, A.T., Abeygunawardana, C., & Shortle, 


D. (1994) Biochemistry 33, 1063-1072. 

Wang, Y., & Shortle, D. (1995) Biochemistry 34, 
15895-15905. 

Peters, R.J., Shiau, A.K., Sohl, J.L., Anderson, D.E., Tang, 


147. 


148. 


149. 


150. 


151. 


152. 


153. 


154. 


155. 


156. 


157. 


158. 
159. 
160. 
161. 
162. 
163. 
164. 
165. 
166. 
167. 
168. 
169. 
170. 
171. 
172. 
173. 
174. 
175. 
176. 
177. 


178. 


References 735 


G., Silen, J.L., & Agard, D.A. (1998) Biochemistry 37, 
12058-12067. 

Ikemura, H., & Inouye, M. (1988) J. Biol. Chem. 263, 
12959-12963. 

Eder, J., Rheinnecker, M., & Fersht, A.R. 
Biochemistry 32, 18-26. 

Zhu, X.L., Ohta, Y., Jordan, F., & Inouye, M. (1989) 
Nature 339, 483-484. 

Winther, J.R., & Sorensen, P. (1991) Proc. Natl. Acad. 
Sci. U.S.A. 88, 9330-9334. 

Klee, W.A. (1968) Biochemistry 7, 2731-2736. 

Wyckoff, H.W., Hardman, K.D., Allewell, N.M., 
Inagami, T., Johnson, L.N., & Richards, F.M. (1967) J. 
Biol. Chem. 242, 3984-3988. 

Taniuchi, H., & Anfinsen, C.B. (1971) J. Biol. Chem. 246, 
2291-2301. 

Sancho, J., & Fersht, A.R. (1992) J. Mol. Biol. 224, 
741-747. 

Eder, J., & Kirschner, K. (1992) Biochemistry 31, 
3617-3625. 

Lindsay, C.D., & Pain, R.H. (1991) Biochemistry 30, 
9034-9040. 

Rochet, J.C., Oikawa, K., Hicks, L.D., Kay, C.M., Bridger, 
W.A., & Wolodko, W.T. (1997) Biochemistry 36, 
8807-8820. 

Goldenberg, D.P., & Creighton, T.E. (1983) J. Mol. Biol. 
165, 407-413. 

Luger, K., Hommel, U., Herold, M., Hofsteenge, J., & 
Kirschner, K. (1989) Science 243, 206-210. 

Yang, Y.R., & Schachman, H.K. (1993) Proc. Natl. Acad. 
Sci. U.S.A. 90, 11980-11984. 

Buchwalder, A., Szadkowski, H., & Kirschner, K. (1992) 
Biochemistry 31, 1621-1630. 

Kreitman, R.J., Puri, R.K., & Pastan, I. (1994) Proc. Natl. 
Acad. Sci. U.S.A. 91, 6889-6893. 

Mullins, L.S., Wesseling, K., Kuo, J.M., Garrett, J.B., & 
Raushel, F.M. (1994) J. Am. Chem. Soc. 116, 5529-5533. 
Graf, R., & Schachman, H.K. (1996) Proc. Natl. Acad. Sci. 
U.S.A. 93, 11591-11596. 

Hennecke, J., Sebbel, P., & Glockshuber, R. (1999) J. 
Mol. Biol. 286, 1197-1215. 

Iwakura, M., Nakamura, T., Yamane, C., & Maki, K. 
(2000) Nat. Struct. Biol. 7, 580-585. 

Haber, E. (1964) Proc. Natl. Acad. Sci. U.S.A. 52, 
1099-1106. 

Whitney, P.L., & Tanford, C. (1965) Proc. Natl. Acad. Sci. 
U.S.A. 53, 524-532. 

Painter, R.G., Sage, HI, 
Biochemistry 11, 1338-1345. 
Kauzman, W. (1959) Adv. Protein Chem. 14, 1-63. 
Chothia, C. (1976) J. Mol. Biol. 105, 1-12. 

Dill, K.A. (1990) Biochemistry 29, 7133-7155. 

Takano, K., Scholtz, J.M., Sacchettini, J.C., & Pace, C.N. 
(2003) J. Biol. Chem. 278, 31790-31795. 

Privalov, P.L., & Makhatadze, G.I. (1993) J. Mol. Biol. 
232, 660-679. 

Flory, P.J. (1949) J. Chem. Phys. 17, 303-310. 

Dill, K.A. (1985) Biochemistry 24, 1501-1509. 
Villafranca, J.E., Howell, E.E., Oatley, S.J., Xuong, N.H., 
& Kraut, J. (1987) Biochemistry 26, 2182-2189. 

Pjura, P.E., Matsumura, M., Wozniak, J.A., & Matthews, 
B.W. (1990) Biochemistry 29, 2592-2598. 


(1993) 


& Tanford, C. (1972) 


736 


179. 


180. 


181. 
182. 


183. 


184. 


185. 


186. 


187. 


188. 


189. 


190. 


191. 


192. 


193. 


194. 


195. 
196. 


197. 


198. 


199. 


200. 


201. 


202. 


203. 


204. 


205. 


206. 


207. 


208. 


209. 


Folding and Assembly 


Siedler, F., Rudolph-Bohner, S., Doi, M., Musiol, H.J., & 
Moroder, L. (1993) Biochemistry 32, 7488-7495. 
Johnson, R.E., Adams, P., & Rupley, J.A. (1978) 
Biochemistry 17, 1479-1484. 

Imoto, T., & Rupley, J.A. (1973) J. Mol. Biol. 80, 657-667. 
Mitra, S., & Lawton, R.G. (1979) J. Am. Chem. Soc. 101, 
3097-3110. 

Lin, S.H., Konishi, Y., Denton, M.E., & Scheraga, H.A. 
(1984) Biochemistry 23, 5504-5512. 

Matsumura, M., Becktel, W.J., Levitt, M., & Matthews, 
B.W. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, 6562-6566. 
Zhang, T., Bertelsen, E., & Alber, T. (1994) Nat. Struct. 
Biol. 1, 434-438. 

Chan, H.S., & Dill, K.A. (1989) J. Chem. Phys. 90, 
492-509. 

Ikeguchi, M., Sugai, S., Fujino, M., Sugawara, T., & 
Kuwajima, K. (1992) Biochemistry 31, 12695-12700. 
Eder, J., & Wilmanns, M. (1992) Biochemistry 31, 
4437-4444. 

Robinson, C.R., & Sauer, R.T. (2000) Biochemistry 39, 
12494-12502. 

Goto, Y., & Hamaguchi, K. (1982) J. Mol. Biol. 156, 
911-926. 

Lau, K.F., & Dill, K.A. (1989) Macromolecules 22, 
3986-3997. 

Dill, K.A., Alonso, D.O., & Hutchinson, K. (1989) 
Biochemistry 28, 5439-5449. 

Flory, P.J. (1953) Principles of Polymer Chemistry, 
Cornell University Press, Ithaca, NY. 

Flory, P.J., & Fisk, S. (1966) J. Chem. Phys. 44, 
2243-2248. 

Sanchez, LC (1979) Macromolecules 12, 980-988. 
Shaw, G.S., Hodges, R.S., & Sykes, B.D. (1990) Science 
(Washington, D.C.) 249, 280-283. 

Chen, L.H., Kenyon, G.L., Curtin, F., Harayama, S., 
Bembenek, M.E., Hajipour, G., & Whitman, C.P. (1992) 
J. Biol. Chem. 267, 17716-17721. 

Achari, A., Hale, S.P., Howard, A.J., Clore, G.M., 
Gronenborn, A.M., Hardman, K.D., & Whitlow, M. 
(1992) Biochemistry 31, 10449-10457. 

Macias, M.J., Gervais, V., Civera, C., & Oschkinat, H. 
(2000) Nat. Struct. Biol. 7, 375-379. 

Spector, S., Kuhlman, B., Fairman, R., Wong, E., Boice, 
J.A., & Raleigh, D.P. (1998) J. Mol. Biol. 276, 479-489. 
Chan, H.S., & Dill, K.A. (1990) Proc. Natl. Acad. Sci. 
U.S.A. 87, 6388-6392. 

Nishii, I., Kataoka, M., & Goto, Y. (1995) J. Mol. Biol. 250, 
223-238. 

Buchner, J., Renner, M., Lilie, H., Hinz, H.J., Jaenicke, 
R., Kiefhabel, T., & Rudolph, R. (1991) Biochemistry 30, 
6922-6929. 

Kuwajima, K. (1977) J. Mol. Biol. 114, 241-258. 

Xie, D., Bhakuni, V., & Freire, E. (1991) Biochemistry 30, 
10673-10678. 

Stellwagen, E., & Babul, J. (1975) Biochemistry 14, 
5135-5140. 

Davis-Searles, P.R., Morar, A.S., Saunders, A.J., Erie, 
D.A, & Pielak, G.J. (1998) Biochemistry 37, 
17048-17053. 

Matthews, J.M., Norton, R.S., Hammacher, A., & 
Simpson, R.J. (2000) Biochemistry 39, 1942-1950. 
Dolgikh, D.A., Gilmanshin, R.I., Brazhnikov, E.V., 


210. 
211. 


212. 


213. 
214. 
215. 
216. 
217. 
218. 
219. 
220. 
221. 


222. 


223. 


224. 


225. 


226. 


227. 


228. 


229. 


230. 


231. 


232. 


233. 


234. 


235. 


236. 


237. 


238. 


Bychkova, V.E., Semisotnov, G.V., Venyaminov, S., & 
Ptitsyn, O.B. (1981) FEBS Lett. 136, 311-315. 

Ohgushi, M., & Wada, A. (1983) FEBS Lett. 164, 21-24. 
Baum, J., Dobson, C.M., Evans, P.A., & Hanley, C. 
(1989) Biochemistry 28, 7-13. 

Bu, Z., Neumann, D.A., Lee, S.H., Brown, C.M., 
Engelman, D.M., & Han, C.C. (2000) J. Mol. Biol. 301, 
525-536. 

Nolting, B., Jiang, M., & Sligar, S.G. (1993) J. Am. Chem. 
Soc. 115, 9879-9882. 

Jeng, M.F., Englander, S.W., Elove, G.A., Wand, A.J., & 
Roder, H. (1990) Biochemistry 29, 10433-10437. 
Schulman, B.A., Redfield, C., Peng, Z.Y., Dobson, C.M., 
& Kim, P.S. (1995) J. Mol. Biol. 253, 651-657. 

Eliezer, D., & Wright, P.E. (1996) J. Mol. Biol. 263, 
531-538. 

Hughson, F.M., Barrick, D., & Baldwin, R.L. (1991) 
Biochemistry 30, 4113-4118. 

Chakraborty, S., Ittah, V., Bai, P., Luo, L., Haas, E., & 
Peng, Z. (2001) Biochemistry 40, 7228-7238. 

Oas, T.G., & Kim, P.S. (1988) Nature 336, 42-48. 
Baldwin, R.L. (1989) Trends Biochem. Sci. 14, 291-294. 
Jeng, M.F., & Englander, S.W. (1991) J. Mol. Biol. 221, 
1045-1061. 

Gualfetti, P.J., Iwakura, M., Lee, J.C., Kihara, H., Bilsel, 
O., Zitzewitz, JA, & Matthews, C.R. (1999) 
Biochemistry 38, 13367-13378. 

Peng, Z.Y., & Kim, P.S. (1994) Biochemistry 33, 
2136-2141. 

Otzen, D.E., & Oliveberg, M. (2001) J. Mol. Biol. 313, 
479-483. 

Yamasaki, K., Ogasahara, K., Yutani, K., Oobatake, M., 
& Kanaya, S. (1995) Biochemistry 34, 16552-16562. 
Raschke, T.M., & Marqusee, S. (1997) Nat. Struct. Biol. 
4, 298-304. 

Jennings, P.A, & Wright, P.E. (1993) Science 262, 
892-896. 

Nishimura, C., Riley, R., Eastman, P., & Fink, A.L. (2000) 
J. Mol. Biol. 299, 1133-1146. 

Su, Z.D., Arooz, M.T., Chen, H.M., Gross, C.J., & Tsong, 
T.Y. (1996) Proc. Natl. Acad. Sci. U.S.A. 93, 2539- 
2544. 

Elove, G.A., Chaffotte, A.F., Roder, H., & Goldberg, M.E. 
(1992) Biochemistry 31, 6876-6883. 

Kuwajima, K., Garvey, E.P., Finn, B.E., Matthews, C.R., 
& Sugai, S. (1991) Biochemistry 30, 7693-7703. 
Fujiwara, K., Arai, M., Shimizu, A., Ikeguchi, M., 
Kuwajima, K., & Sugai, S. (1999) Biochemistry 38, 
4455-4463. 

Morozova-Roche, L.A., Jones, J.A., Noppe, W., & 
Dobson, C.M. (1999) J. Mol. Biol. 289, 1055-1073. 
Morgan, C.J., Miranker, A., & Dobson, C.M. (1998) 
Biochemistry 37, 8473-8480. 

Qi, P.X., Sosnick, T.R., & Englander, S.W. (1998) Nat. 
Struct. Biol. 5, 882-884. 

Parker, M.J., & Marqusee, S. (1999) J. Mol. Biol. 293, 
1195-1210. 

Rocek, J., Westheimer, F.H., Eschenmoser, A., 
Moldovanyi, L., & Schreiber, J. (1962) Helv. Chim. Acta 
45, 2554-2567. 

Roder, H., & Colon, W. (1997) Curr. Opin. Struct. Biol. 
7, 15-28. 


239. 


240. 


241. 


242. 


243. 


244. 


245. 


246. 


247. 


248. 


249. 


250. 


251. 


252; 


253. 


254. 


255. 


256. 


257. 


258. 


259. 


260. 


261. 


262. 


263. 


264. 


265. 


266. 


Parker, M.J., Dempsey, C.E., Lorch, M., & Clarke, A.R. 
(1997) Biochemistry 36, 13396-13405. 

Schreiber, G., & Fersht, A.R. (1993) Biochemistry 32, 
11195-11203. 

Sauder, J.M., MacKenzie, N.E., & Roder, H. (1996) 
Biochemistry 35, 16852-16862. 

Herning, T., Yutani, K., Taniyama, Y., & Kikuchi, M. 
(1991) Biochemistry 30, 9882-9891. 

Chen, B.L., Baase, W.A., Nicholson, H., & Schellman, 
J.A. (1992) Biochemistry 31, 1464-1476. 

Kyte, J. (1995) Mechanism in Protein Chemistry, pp 
461-473, Garland, New York. 

Eliezer, D., Jennings, P.A., Wright, P.E., Doniach, S., 
Hodgson, K.O., & Tsuruta, H. (1995) Science 270, 
487—488. 

Arai, M., Ikura, T., Semisotnov, G.V., Kihara, H., 
Amemiya, Y., & Kuwajima, K. (1998) J. Mol. Biol. 275, 
149-162. 

Jones, B.E., Beechem, J.M., & Matthews, C.R. (1995) 
Biochemistry 34, 1867-1877. 

Kuwajima, K., Yamaya, H., Miwa, S., Sugai, S., & 
Nagamura, T. (1987) FEBS Lett. 221, 115-118. 

Hooke, S.D., Radford, S.E., & Dobson, C.M. (1994) 
Biochemistry 33, 5867-5876. 

Bycroft, M., Matouschek, A., Kellis, J.T., Jr., Serrano, L., 
& Fersht, A.R. (1990) Nature 346, 488-490. 

Roder, H., Elove, G.A., & Englander, S.W. (1988) Nature 
335, 700-704. 

Udgaonkar, J.B., & Baldwin, R.L. (1990) Proc. Natl. 
Acad. Sci. U.S.A. 87, 8197-8201. 

Parker, M.J., & Marqusee, S. (2001) J. Mol. Biol. 305, 
593-602. 

O’Neill, J.C., Jr., & Robert Matthews, C. (2000) J. Mol. 
Biol. 295, 737-744. 

Dabora, J.M., Pelton, J.G., & Marqusee, S. 
Biochemistry 35, 11951-11958. 

Raschke, T.M., Kho, J., & Marqusee, S. (1999) Nat. 
Struct. Biol. 6, 825-831. 

Segel, D.J., Bachmann, A., Hofrichter, J., Hodgson, 
K.O., Doniach, S., & Kiefhaber, T. (1999) J. Mol. Biol. 
288, 489-499. 

Heidary, D.K., O’Neill, J.C., Jr., Roy, M., & Jennings, P.A. 
(2000) Proc. Natl. Acad. Sci. U.S.A. 97, 5866- 
5870. 

Akiyama, S., Takahashi, S., Ishimori, K., & Morishima, 
I. (2000) Nat. Struct. Biol. 7, 514-520. 

Kuwata, K., Shastry, R., Cheng, H., Hoshino, M., Batt, 
C.A., Goto, Y., & Roder, H. (2001) Nat. Struct. Biol. 8, 
151-155. 

Park, S.H., Shastry, M.C., & Roder, H. (1999) Nat. Struct. 
Biol. 6, 943-947. 

Yeh, S.R., Ropson, I.J., & Rousseau, D.L. (2001) 
Biochemistry 40, 4205-4210. 

Capaldi, A.P., Shastry, M.C., Kleanthous, C., Roder, H., 
& Radford, S.E. (2001) Nat. Struct. Biol. 8, 68-72. 
Hagen, S.J., & Eaton, W.A. (2000) J. Mol. Biol. 301, 
1019-1027. 

Nolting, B., Golbik, R., Neira, J.L., Soler-Gonzalez, A.S., 
Schreiber, G., & Fersht, A.R. (1997) Proc. Natl. Acad. Sci. 
U.S.A. 94, 826-830. 

Ballew, R.M., Sabelko, J., & Gruebele, M. (1996) Proc. 
Natl. Acad. Sci. U.S.A. 93, 5759-5764. 


(1996) 


267. 


268. 


269. 


270. 


References 737 


Phillips, C.M., Mizutani, Y., & Hochstrasser, R.M. 
(1995) Proc. Natl. Acad. Sci. U.S.A. 92, 7292-7296. 
Sosnick, T.R., Shtilerman, M.D., Mayne, L., & 
Englander, S.W. (1997) Proc. Natl. Acad. Sci. U.S.A. 94, 
8545-8550. 

Chan, C.K., Hu, Y., Takahashi, S., Rousseau, D.L., 
Eaton, W.A., & Hofrichter, J. (1997) Proc. Natl. Acad. Sci. 
U.S.A. 94, 1779-1784. 

Shastry, M.C., & Roder, H. (1998) Nat. Struct. Biol. 5, 
385-392. 


. Plaxco, K.W., Millett, 1.S., Segel, D.J., Doniach, S., & 


Baker, D. (1999) Nat. Struct. Biol. 6, 554-556. 


. Bieri, O., Wirz, J., Hellrung, B., Schutkowski, M., 


Drewello, M., & Kiefhaber, T. (1999) Proc. Natl. Acad. 
Sci. U.S.A. 96, 9597-9601. 


. Jacob, M., Geeves, M., Holtermann, G., & Schmid, F.X. 


(1999) Nat. Struct. Biol. 6, 923-926. 


. Teilum, K., Kragelund, B.B., Knudsen, J., & Poulsen, 


F.M. (2000) J. Mol. Biol. 301, 1307-1314. 


. Ladurner, A.G., & Fersht, A.R. (1999) Nat. Struct. Biol. 


6, 28-31. 


. Matouschek, A., Kellis, J.T., Jr., Serrano, L., Bycroft, M., 


& Fersht, A.R. (1990) Nature 346, 440-445. 


. Gassner, N.C., Baase, W.A., Lindstrom, J.D., Lu, J., 


Dahlquist, F.W., & Matthews, B.W. (1999) Biochemistry 
38, 14451-14460. 


. Jacobs, M.D., & Fox, R.O. (1994) Proc. Natl. Acad. Sci. 


U.S.A. 91, 449-453. 


. Radford, S.E., Dobson, C.M., & Evans, P.A. (1992) 


Nature 358, 302-307. 


. Andersson, D., Hammarstrom, P., & Carlsson, U. (2001) 


Biochemistry 40, 2653-2661. 


. Xu, Y., Mayne, L., & Englander, S.W. (1998) Nat. Struct. 


Biol. 5, 774-778. 


. Stewart, D.E., Sarkar, A., & Wampler, J.E. (1990) J. Mol. 


Biol. 214, 253-260. 


. MacArthur, M.W., & Thornton, J.M. (1991) J. Mol. Biol. 


218, 397-412. 


. Garel, J.R., & Baldwin, R.L. (1973) Proc. Natl. Acad. Sci. 


U.S.A. 70, 3347-3351. 


. Garel, J.R., Nall, B.T., & Baldwin, R.L. (1976) Proc. Natl. 


Acad. Sci. U.S.A. 73, 1853-1857. 


. Brandts, J.F., Halvorson, H.R., & Brennan, M. (1975) 


Biochemistry 14, 4953-4963. 


. Lin, L.N., & Brandts, J.F. (1983) Biochemistry 22, 


559-563. 


. Schmid, F.X., & Baldwin, R.L. (1978) Proc. Natl. Acad. 


Sci. U.S.A. 75, 4764-4768. 


. Schmid, F.X., Grafl, R., Wrba, A., & Beintema, J.J. (1986) 


Proc. Natl. Acad. Sci. U.S.A. 83, 872-876. 


. Henkens, R.W., Gerber, A.D., Cooper, M.R., & Herzog, 


W.R., Jr. (1980) J. Biol. Chem. 255, 7075-7078. 


. Schmid, F.X., & Baldwin, R.L. (1979) J. Mol. Biol. 133, 


285-287. 


. Schultz, D.A., & Baldwin, R.L. (1992) Protein Sci. 1, 


910-916. 


. Kim, P.S., & Baldwin, R.L. (1980) Biochemistry 19, 


6124-6129. 


. Schmid, F.X., & Blaschek, H. (1981) Eur. J. Biochem. 


114, 111-117. 


. Schmid, F., & Blaschek, H. (1984) Biochemistry 23, 


2128-2133. 


738 


296. 


297. 


298. 


299. 


300. 


301. 


302. 


303. 


304. 


305. 


306. 


307. 


308. 


309. 


322. 


323. 


324. 


Folding and Assembly 


Martinez-Oyanedel, J., Choe, H.W., Heinemann, U., & 
Saenger, W. (1991) J. Mol. Biol. 222, 335-352. 

Mayr, L.M., Odefey, C., Schutkowski, M., & Schmid, 
F.X. (1996) Biochemistry 35, 5550-5561. 

Mullins, L.S., Pace, C.N., & Raushel, F.M. (1993) 
Biochemistry 32, 6152-6156. 

Kiefhaber, T., Quaas, R., Hahn, U., & Schmid, F.X. 
(1990) Biochemistry 29, 3061-3070. 

Kiefhaber, T., Grunert, H.P., Hahn, U., & Schmid, F.X. 
(1992) Proteins: Struct., Funct., Genet. 12, 171-179. 
Kelley, R.F., & Richards, F.M. (1987) Biochemistry 26, 
6765-6774. 

Walkenhorst, W.F., Green, S.M., & Roder, H. (1997) 
Biochemistry 36, 5795-5805. 

Maki, K., Ikura, T., Hayano, T., Takahashi, N., & 
Kuwajima, K. (1999) Biochemistry 38, 2213-2223. 
Bilsel, O., Zitzewitz, J.A., Bowers, K.E., & Matthews, C.R. 
(1999) Biochemistry 38, 1018-1029. 

Jackson, S.E., & Fersht, A.R. (1991) Biochemistry 30, 
10436-10443. 

van Nuland, N.A., Chiti, F., Taddei, N., Raugei, G., 
Ramponi, G., & Dobson, C.M. (1998) J. Mol. Biol. 283, 
883-891. 

Mayr, L.M., Landt, O., Hahn, U., & Schmid, F.X. (1993) 
J. Mol. Biol. 231, 897-912. 

Pappenberger, G., Aygun, H., Engels, J.W., Reimer, U., 
Fischer, G., & Kiefhaber, T. (2001) Nat. Struct. Biol. 8, 
452-458. 

Schiene-Fischer, C., & Fischer, G. (2001) J. Am. Chem. 
Soc. 123, 6227-6231. 


. Krebs, H., Schmid, F.X., & Jaenicke, R. (1983) J. Mol. 


Biol. 169, 619-635. 


. Chazin, W.J., Kordel, J., Drakenberg, T., Thulin, E., 


Brodin, P., Grundstrom, T., & Forsen, S. (1989) Proc. 
Natl. Acad. Sci. U.S.A. 86, 2195-2198. 


. Fischer, G., Wittmann-Liebold, B., Lang, K., Kiefhaber, 


T., & Schmid, F.X. (1989) Nature 337, 476-478. 


. Takahashi, N., Hayano, T., & Suzuki, M. (1989) Nature 


337, 473-475. 


. Fischer, G., Bang, H., & Mech, C. (1984) Biomed. 


Biochim. Acta 43, 1101-1111. 


. Fischer, G., & Bang, H. (1985) Biochim. Biophys. Acta 


828, 39-42. 


. Lang, K., Schmid, F.X., & Fischer, G. (1987) Nature 329, 


268-270. 


. Lang, K., & Schmid, F.X. (1988) Nature 331, 453-455. 
. Beachinger, 


H.P. (1987) J. Biol. Chem. 262, 


17144-17148. 


. Siekierka, J.J., Hung, S.H., Poe, M., Lin, C.S., & Sigal, 


N.H. (1989) Nature 341, 755-757. 


. Harding, M.W., Galat, A., Uehling, D.E., & Schreiber, 


S.L. (1989) Nature 341, 758-760. 


. Schonbrunner, E.R., Mayer, S., Tropschug, M., Fischer, 


G., Takahashi, N., & Schmid, F.X. (1991) J. Biol. Chem. 
266, 3630-3635. 

Matouschek, A., Rospert, S., Schmid, K., Glick, B.S., & 
Schatz, G. (1995) Proc. Natl. Acad. Sci. U.S.A. 92, 
6319-6323. 

Stoller, G., Rucknagel, K.P., Nierhaus, K.H., Schmid, 
F.X., Fischer, G., & Rahfeld, J.U. (1995) EMBO J. 14, 
4939-4948. 

Tropschug, M., Nicholson, D.W., Hartl, F.U., Kohler, 


325. 


326. 


327. 


328. 


329. 


330. 


331. 


332. 


333. 


334. 


335. 


336. 


337. 


338. 


339. 


340. 


341. 


342. 


343. 


344. 


345. 


346. 


347. 


348. 


349. 


350. 


351. 


352. 


H., Pfanner, N., Wachter, E., & Neupert, W. (1988) J. 
Biol. Chem. 263, 14433-14440. 

Gasser, C.S., Gunning, D.A., Budelier, K.A., & Brown, 
S.M. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 9519-9523. 
Scholz, C., Mucke, M., Rape, M., Pecht, A., Pahl, A., Bang, 
H., & Schmid, F.X. (1998) J. Mol. Biol. 277, 723-732. 
Mucke, M., & Schmid, F.X. (1992) Biochemistry 31, 
7848-7854. 

Veeraraghavan, S., & Nall, B.T. (1994) Biochemistry 33, 
687-692. 

Schindler, T., & Schmid, F.X. (1996) Biochemistry 35, 
16833-16842. 

Chiti, F., Taddei, N., van Nuland, N.A., Magherini, F., 
Stefani, M., Ramponi, G., & Dobson, C.M. (1998) J. Mol. 
Biol. 283, 893-903. 

Otzen, D.E., Kristensen, O., Proctor, M., & Oliveberg, M. 
(1999) Biochemistry 38, 6499-6511. 

Mayor, U., Johnson, C.M., Daggett, V., & Fersht, A.R. 
(2000) Proc. Natl. Acad. Sci. U.S.A. 97, 13518-13522. 
Krantz, B.A., & Sosnick, T.R. (2000) Biochemistry 39, 
11696-11701. 

Burton, R.E., Huang, G.S., Daugherty, M.A., Fullbright, 
P.W., & Oas, T.G. (1996) J. Mol. Biol. 263, 311-322. 
Huang, G.S., & Oas, T.G. (1995) Proc. Natl. Acad. Sci. 
U.S.A. 92, 6878-6882. 

Spector, S., & Raleigh, D.P. (1999) J. Mol. Biol. 293, 
763-768. 

Vidugiris, G.J., Markley, J.L., & Royer, C.A. (1995) 
Biochemistry 34, 4909-4912. 

Jacob, M., Holtermann, G., Perl, D., Reinstein, J., 
Schindler, T., Geeves, M.A., & Schmid, F.X. (1999) 
Biochemistry 38, 2882-2891. 

Tan, Y.J., Oliveberg, M., & Fersht, A.R. (1996) J. Mol. 
Biol. 264, 377-389. 

Taddei, N., Chiti, F., Fiaschi, T., Bucciantini, M., 
Capanni, C., Stefani, M., Serrano, L., Dobson, C.M., & 
Ramponi, G. (2000) J. Mol. Biol. 300, 633-647. 
Srivastava, A.K., & Sauer, R.T. (2000) Biochemistry 39, 
8308-8314. 

Burton, R.E., Huang, G.S., Daugherty, M.A., Calderone, 
T.L., & Oas, T.G. (1997) Nat. Struct. Biol. 4, 305-310. 
Otzen, D.E., Itzhaki, L.S., elMasry, N.F., Jackson, S.E., & 
Fersht, A.R. (1994) Proc. Natl. Acad. Sci. U.S.A. 91, 
10422-10425. 

Jager, M., Nguyen, H., Crane, J.C., Kelly, J.W., & 
Gruebele, M. (2001) J. Mol. Biol. 311, 373-393. 

Lee, J.C., Gray, H.B., & Winkler, J.R. (2001) Proc. Natl. 
Acad. Sci. U.S.A. 98, 7760-7764. 

Parker, M.J., & Marqusee, S. (2000) J. Mol. Biol. 300, 
1361-1375. 

Kiefhaber, T. (1995) Proc. Natl. Acad. Sci. U.S.A. 92, 
9029-9033. 

Wildegger, G., & Kiefhaber, T. (1997) J. Mol. Biol. 270, 
294-304. 

Matagne, A., Radford, S.E., & Dobson, C.M. (1997) J. 
Mol. Biol. 267, 1068-1074. 

Jennings, P.A., Finn, B.E., Jones, B.E., & Matthews, C.R. 
(1993) Biochemistry 32, 3783-3789. 

Iwakura, M., Jones, B.E., Falzone, C.J., & Matthews, C.R. 
(1993) Biochemistry 32, 13566-13574. 
Cayley, P.J., Dunn, S.M., & King, 
Biochemistry 20, 874-879. 


R.W. (1981) 


353. 


354. 


355. 
356. 
357. 
358. 
359. 
360. 
361. 
362. 


363. 


364. 


365. 


366. 


367. 


368. 


369. 


370. 


371. 


372. 


373. 


374. 


375. 


376. 
377. 
378. 
379. 


380. 


381. 


Ionescu, R.M., Smith, V.F., O’Neill, J.C., Jr, & 
Matthews, C.R. (2000) Biochemistry 39, 9540-9550. 
Takahashi, S., Yeh, S.R., Das, T.K., Chan, C.K., 
Gottfried, D.S., & Rousseau, D.L. (1997) Nat. Struct. 
Biol. 4, 44-50. 

Guidry, J., & Wittung-Stafshede, P. (2000) J. Mol. Biol. 
301, 769-773. 

Silow, M., Tan, Y.J., Fersht, A.R., & Oliveberg, M. (1999) 
Biochemistry 38, 13006-13012. 

Nawrocki, J.P., Chu, R.A., Pannell, L.K., & Bai, Y. (1999) 
J. Mol. Biol. 293, 991-995. 

Goldberg, M.E., Rudolph, R., & Jaenicke, R. (1991) 
Biochemistry 30, 2790-2797. 

Vaucheret, H., Signon, L., Le Bras, G., & Garel, J.R. 
(1987) Biochemistry 26, 2785-2790. 

Pelham, H.R. (1986) Cell 46, 959-961. 

Pelham, H. (1988) Nature 332, 776-777. 

Reading, D.S., Hallberg, R.L., & Myers, A.M. (1989) 
Nature 337, 655-659. 

Hemmingsen, S.M., Woolford, C., van der Vies, S.M., 
Tilly, K., Dennis, D.T., Georgopoulos, C.P., Hendrix, 
R.W., & Ellis, R.J. (1988) Nature 333, 330-334. 
Ostermann, J., Horwich, A.L., Neupert, W., & Hartl, F.U. 
(1989) Nature 341, 125-130. 

Kubota, H., Hynes, G., & Willison, K. (1995) Eur. J. 
Biochem. 230, 3-16. 

Braig, K., Otwinowski, Z., Hegde, R., Boisvert, D.C., 
Joachimiak, A., Horwich, A.L., & Sigler, P.B. (1994) 
Nature 371, 578-586. 

Ditzel, L., Lowe, J., Stock, D., Stetter, K.O., Huber, H., 
Huber, R., & Steinbacher, S. (1998) Cell 93, 125-138. 
Llorca, O., Smyth, M.G., Carrascosa, J.L., Willison, K.R., 
Radermacher, M., Steinbacher, S., & Valpuesta, J.M. 
(1999) Nat. Struct. Biol. 6, 639-642. 

Wang, J., & Boisvert, D.C. (2003) J. Mol. Biol. 327, 
843-855. 

Chen, S., Roseman, A.M., Hunter, A.S., Wood, S.P., 
Burston, S.G., Ranson, N.A., Clarke, A.R., & Saibil, H.R. 
(1994) Nature 371, 261-264. 

Chaudhry, C., Farr, G.W., Todd, M.J., Rye, H.S., 
Brunger, A.T., Adams, P.D., Horwich, A.L., & Sigler, P.B. 
(2003) EMBO J. 22, 4877-4887. 

Xu, Z., Horwich, A.L., & Sigler, P.B. (1997) Nature 388, 
741-750. 

Goloubinoff, P., Christeller, J.T., Gatenby, A.A., & 
Lorimer, G.H. (1989) Nature 342, 884-889. 

Buchner, J., Schmidt, M., Fuchs, M., Jaenicke, R., 
Rudolph, R., Schmid, F.X., & Kiefhaber, T. (1991) 
Biochemistry 30, 1586-1591. 

van der Vies, S.M., Viitanen, P.V., Gatenby, A.A., 
Lorimer, G.H., & Jaenicke, R. (1992) Biochemistry 31, 
3635-3644. 

Fisher, M.T. (1992) Biochemistry 31, 3955-3963. 
Fisher, M.T. (1993) J. Biol. Chem. 268, 13777-13779. 
Itzhaki, L.S., Otzen, D.E.. & Fersht, A.R. (1995) 
Biochemistry 34, 14581-14587. 

Bhutani, N., & Udgaonkar, J.B. (2001) J. Mol. Biol. 314, 
1167-1179. 

Goldberg, M.S., Zhang, J., Sondek, S., Matthews, C.R., 
Fox, R.O., & Horwich, A.L. (1997) Proc. Natl. Acad. Sci. 
U.S.A. 94, 1080-1085. 

Coyle, J.E., Texter, F.L., Ashcroft, A.E., Masselos, D., 


382. 


383. 


384. 


385. 


386. 


387. 


388. 


389. 


390. 


391. 


392. 


393. 


394. 


395. 


396. 


397. 


398. 


399. 


400. 


401. 


402. 


403. 


404. 


405. 


406. 


407. 


References 739 


Robinson, C.V., & Radford, S.E. (1999) Nat. Struct. Biol. 
6, 683-690. 

Zahn, R., Perrett, S., & Fersht, A.R. (1996) J. Mol. Biol. 
261, 43-61. 

Zahn, R., Spitzfaden, C., Ottiger, M., Wuthrich, K., & 
Pluckthun, A. (1994) Nature 368, 261-265. 

Walter, S., Lorimer, G.H., & Schmid, F.X. (1996) Proc. 
Natl. Acad. Sci. U.S.A. 93, 9425-9430. 

Robinson, C.V., Gross, M., Eyles, S.J., Ewbank, J.J., 
Mayhew, M., Hartl, F.U., Dobson, C.M., & Radford, S.E. 
(1994) Nature 372, 646-651. 

Nieba-Axmann, S.E., Ottiger, M., Wuthrich, K., & 
Pluckthun, A. (1997) J. Mol. Biol. 271, 803-818. 

Chen, J., Walter, S., Horwich, A.L., & Smith, D.L. (2001) 
Nat. Struct. Biol. 8, 721-728. 

Zahn, R., Perrett, S., Stenberg, G., & Fersht, A.R. (1996) 
Science 271, 642-645. 

Gervasoni, P., Staudenmann, W., James, P., Gehrig, P., 
& Pluckthun, A. (1996) Proc. Natl. Acad. Sci. U.S.A. 93, 
12189-12194. 

Landry, S.J., & Gierasch, L.M. (1991) Biochemistry 30, 
7359-7362. 

Falke, S., Fisher, M.T., & Gogol, E.P. (2001) J. Mol. Biol. 
308, 569-577. 

Braig, K., Simon, M., Furuya, F., Hainfeld, J.F., & 
Horwich, A.L. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 
3978-3982. 

Langer, T., Pfeifer, G., Martin, J., Baumeister, W., & 
Hartl, F.U. (1992) EMBO J. 11, 4757-4765. 

Chaudhuri, T.K., Farr, G.W., Fenton, W.A., Rospert, S., 
& Horwich, A.L. (2001) Cell 107, 235-246. 

Sakikawa, C., Taguchi, H., Makino, Y., & Yoshida, M. 
(1999) J. Biol. Chem. 274, 21251-21256. 

Wang, J.D., Michelitsch, M.D., & Weissman, J.S. (1998) 
Proc. Natl. Acad. Sci. U.S.A. 95, 12163-12168. 

Zahn, R., Buckle, A.M., Perrett, S., Johnson, C.M., 
Corrales, F.J., Golbik, R., & Fersht, A.R. (1996) Proc. 
Natl. Acad. Sci. U.S.A. 93, 15024-15029. 

Buckle, A.M., Zahn, R., & Fersht, A.R. (1997) Proc. Natl. 
Acad. Sci. U.S.A. 94, 3571-3575. 

Mendoza, J.A., Rogers, E., Lorimer, G.H., & Horowitz, 
P.M. (1991) J. Biol. Chem. 266, 13044-13049. 

Makio, T., Takasu-Ishikawa, E., & Kuwajima, K. (2001) 
J. Mol. Biol. 312, 555-567. 

Kubo, T., Mizobata, T., & Kawata, Y. (1993) J. Biol. 
Chem. 268, 19346-19351. 

Lin, Z., & Eisenstein, E. (1996) Proc. Natl. Acad. Sci. 
U.S.A. 93, 1977-1981. 

Badcoe, I.G., Smith, C.J., Wood, S., Halsall, D.J., 
Holbrook, J.J., Lund, P., & Clarke, AR (1991) 
Biochemistry 30, 9195-9200. 

Jackson, G.S., Staniforth, R.A., Halsall, D.J., Atkinson, 
T., Holbrook, J.J., Clarke, A.R., & Burston, S.G. (1993) 
Biochemistry 32, 2554-2563. 

Wynn, R.M., Davie, J.R., Zhi, W., Cox, R.P., & Chuang, 
D.T. (1994) Biochemistry 33, 8962-8968. 

Viitanen, P.V., Donaldson, G.K., Lorimer, G.H., 
Lubben, T.H., & Gatenby, A.A. (1991) Biochemistry 30, 
9716-9723. 

Boisvert, D.C., Wang, J., Otwinowski, Z., Horwich, A.L., 
& Sigler, P.B. (1996) Nat. Struct. Biol. 3, 170- 
177. 


740 


408. 


409. 


410. 


411. 


412. 


413. 


414. 


415. 


416. 


417. 


418. 


419. 


420. 


421. 


422. 


423. 


424. 


425. 


426. 


427. 


428. 


429. 


430. 


431. 


432. 


433. 


434. 


Folding and Assembly 


Beissinger, M., Rutkat, K., & Buchner, J. (1999) J. Mol. 
Biol. 289, 1075-1092. 

Todd, M.J., Viitanen, P.V., & Lorimer, G.H. (1994) 
Science 265, 659-666. 

Ranson, N.A., Dunster, N.J., Burston, S.G., & Clarke, 
AR (1995) J. Mol. Biol. 250, 581-586. 

Hunt, J.F., Weaver, A.J., Landry, S.J., Gierasch, L., & 
Deisenhofer, J. (1996) Nature 379, 37-45. 

Weissman, J.S., Hohl, C.M., Kovalenko, O., Kashi, Y., 
Chen, S., Braig, K., Saibil, H.R., Fenton, W.A., & 
Horwich, A.L. (1995) Cell 83, 577-587. 

Burston, S.G., Ranson, N.A., & Clarke, A.R. (1995) J. Mol. 
Biol. 249, 138-152. 

Rye, H.S., Burston, S.G., Fenton, W.A., Beechem, J.M., 
Xu, Z., Sigler, P.B., & Horwich, A.L. (1997) Nature 388, 
792-798. 

Shtilerman, M., Lorimer, G.H., & Englander, S.W. 
(1999) Science 284, 822-825. 

Flynn, G.C., Chappell, T.G., & Rothman, J.E. (1989) 
Science 245, 385-390. 

Gisler, S.M., Pierpaoli, E.V., & Christen, P. (1998) J. Mol. 
Biol. 279, 833-840. 

Mayer, M.P., Schroder, H., Rudiger, S., Paal, K., Laufen, 
T., & Bukau, B. (2000) Nat. Struct. Biol. 7, 586-593. 
McCarty, J.S., Buchberger, A., Reinstein, J., & Bukau, B. 
(1995) J. Mol. Biol. 249, 126-137. 

Zhu, X., Zhao, X., Burkholder, W.F., Gragerov, A., Ogata, 
C.M., Gottesman, M.E., & Hendrickson, W.A. (1996) 
Science 272, 1606-1614. 

Newton, G.L., Arnold, K., Price, M.S., Sherrill, C., 
Delcardayre, S.B., Aharonowitz, Y., Cohen, G., Davies, 
J., Fahey, R.C., & Davis, C. (1996) J. Bacteriol. 178, 
1990-199. 

Hwang, C., Sinskey, A.J., & Lodish, H.F. (1992) Science 
257, 1496-1502. 

Frech, C., & Schmid, F.X. (1995) J. Mol. Biol. 251, 
135-149. 

Creighton, T.E. (1983) in Functions of Glutathione: 
Biochemical, Physiological, Toxicological, and Clinical 
Aspects (Larsson, A., Ed.) pp 205-222, Raven Press, New 
York. 

Walker, K.W., Lyles, M.M., & Gilbert, H.F. (1996) 
Biochemistry 35, 1972-1980. 

De Lorenzo, F., Goldberger, R.F., Steers, E., Jr., Givol, 
D., & Anfinsen, B. (1966) J. Biol. Chem. 241, 1562-1567. 
Lyles, M.M., & Gilbert, H.F. (1991) Biochemistry 30, 
613-619. 

Narhi, L.O., Hua, Q.X., Arakawa, T., Fox, G.M., Tsai, L., 
Rosenfeld, R., Holst, P., Miller, J.A., & Weiss, M.A. (1993) 
Biochemistry 32, 5214-5221. 

Edman, J.C., Ellis, L., Blacher, R.W., Roth, R.A., & Rutter, 
W.J. (1985) Nature 317, 267-270. 

Bardwell, J.C., McGovern, K., & Beckwith, J. (1991) Cell 
67, 581-589. 

Zapun, A., Bardwell, J.C., & Creighton, T.E. (1993) 
Biochemistry 32, 5083-5092. 

Maskos, K., Huber-Wunderlich, M., & Glockshuber, R. 
(2003) J. Mol. Biol. 325, 495-513. 
Nakamoto, H., & Bardwell, J.C. 
Biophys. Acta 1694, 111-119. 
Darby, N.J., Freedman, R.B., & Creighton, T.E. (1994) 
Biochemistry 33, 7937-7947. 


(2004) Biochim. 


435. 


436. 
437. 


438. 


439. 


440. 


441. 


442. 


443. 


444. 


445. 


446. 


447. 


448. 


449. 


450. 


451. 


452. 


453. 


454. 


455. 


456. 


457. 


458. 


459. 


460. 


461. 


462. 


463. 


Ruoppolo, M., & Freedman, R.B. (1995) Biochemistry 
34, 9380-9388. 

Gilbert, H.F. (1997) J. Biol. Chem. 272, 29399-29402. 
Quan, H., Fan, G., & Wang, C.C. (1995) J. Biol. Chem. 
270, 17078-17080. 

Schonbrunner, E.R., & Schmid, F.X. (1992) Proc. Natl. 
Acad. Sci. U.S.A. 89, 4510-4513. 

Lundstrom, J., & Holmgren, A. (1990) J. Biol. Chem. 265, 
9114-9120. 

Krause, G., Lundstrom, J., Barea, J.L., Pueyo de la 
Cuesta, C., & Holmgren, A. (1991) J. Biol. Chem. 266, 
9494-9500. 

Miranker, A., Radford, S.E., Karplus, M., & Dobson, 
C.M. (1991) Nature 349, 633-636. 

Teschner, W., Rudolph, R., & Garel, J.R. (1987) 
Biochemistry 26, 2791-2796. 

Lillo, M.P., Szpikowska, B.K., Mas, M.T., Sutin, J.D., & 
Beechem, J.M. (1997) Biochemistry 36, 11273-11281. 
Wong, S.C., Burton, P.M., & Josse, J. (1970) J. Biol. 
Chem. 245, 4353-4357. 

Hermann, R., Rudolph, R., Jaenicke, R., Price, N.C., & 
Scobbie, A. (1983) J. Biol. Chem. 258, 11014-11019. 
Hermann, R., Jaenicke, R., & Price, N.C. (1985) 
Biochemistry 24, 1817-1821. 

Mateu, M.G., Sanchez Del Pino, M.M., & Fersht, A.R. 
(1999) Nat. Struct. Biol. 6, 191-198. 

Jaenicke, R., Rudolph, R., & Heider, I. 
Biochemistry 18, 1217-1223. 

Gleason, W.B., Fu, Z., Birktoft, J., & Banaszak, L. (1994) 
Biochemistry 33, 2078-2088. 

Leistler, B., Herold, M., & Kirschner, K. (1992) Eur. J. 
Biochem. 205, 603-611. 

Kim, D.H., Jang, D.S., Nam, G.H., Yun, S., Cho, J.H., 
Choi, G., Lee, H.C., & Choi, K.Y. (2000) Biochemistry 39, 
13084-13092. 

Milla, M.E., & Sauer, R.T. (1994) Biochemistry 33, 
1125-1133. 

Waldburger, C.D., Jonsson, T., & Sauer, R.T. (1996) 
Proc. Natl. Acad. Sci. U.S.A. 93, 2629-2634. 

Burns, D.L., & Schachman, H.K. (1982) J. Biol. Chem. 
257, 8648-8654. 

Burns, D.L., & Schachman, H.K. (1982) J. Biol. Chem. 
257, 8638-8647. 

Yamato, S., & Murachi, T. (1979) Eur. J. Biochem. 93, 
189-195. 

Kervinen, J., Dunbrack, R.L., Jr., Litwin, S., Martins, J., 
Scarrow, R.C., Volin, M., Yeung, A.T., Yoon, E., & Jaffe, 
E.K. (2000) Biochemistry 39, 9018-9029. 

Zhang, Z.Y., Poorman, R.A., Maggiora, L.L., Heinrikson, 
RL, & Kezdy, FJ. (1991) J. Biol. Chem. 266, 
15591-15594. 

Vimard, C., Orsini, G., & Goldberg, M.E. (1975) Eur. J. 
Biochem. 51, 521-527. 

Flynn, G.C., Beckers, C.J., Baase, W.A., & Dahlquist, 
F.W. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 
10826-10830. 

Lane, A.N., Paul, C.H., & Kirschner, K. (1984) EMBO J. 
3, 279-287. 

Hyde, C.C., Ahmed, S.A., Padlan, E.A., Miles, E.W., & 
Davies, D.R. (1988) J. Biol. Chem. 263, 17857-17871. 
Bothwell, M.A., & Schachman, H.K. (1980) J. Biol. 
Chem. 255, 1962-1970. 


(1979) 


464. 


465. 


466. 


467. 


468. 


469. 


470. 


471. 


472. 


473. 


474. 


475. 


476. 
477. 
478. 
479. 
480. 
481. 


482. 


483. 


484. 


485. 


486. 


487. 


488. 


489. 


490. 


491. 
492. 


493. 


Yang, Y.R., Syvanen, J.M., Nagel, G.M., & Schachman, 
H.K. (1974) Proc. Natl. Acad. Sci. U.S.A. 71, 918-923. 
Jacobson, G.R., & Stark, G.R. (1973) J. Biol. Chem. 248, 
8003-8014. 

Bothwell, M.A., & Schachman, H.K. (1980) J. Biol. 
Chem. 255, 1971-1977. 

Bates, D.L., Danson, M.J., Hale, G., Hooper, E.A., & 
Perham, R.N. (1977) Nature 268, 313-316. 
Wagenknecht, T., Francis, N., & DeRosier, D.J. (1983) J. 
Mol. Biol. 165, 523-539. 

Reed, L.J., Pettit, F.H., Eley, M.H., Hamilton, L., Collins, 
J.H., & Oliver, R.M. (1975) Proc. Natl. Acad. Sci. U.S.A. 
72, 3068-3072. 

DeRosier, D.J., & Oliver, R.M. (1971) Cold Spring 
Harbor Symp. Quant. Biol. 36, 199-203. 

Brosius, J., Palmer, M.L., Kennedy, P.J., & Noller, H.F. 
(1978) Proc. Natl. Acad. Sci. U.S.A. 75, 4801-4805. 
Held, W.A., Mizushima, S., & Nomura, M. (1973) J. Biol. 
Chem. 248, 5720-5730. 

Held, W.A., Ballou, B., Mizushima, S., & Nomura, M. 
(1974) J. Biol. Chem. 249, 3103-3111. 

Herold, M., & Nierhaus, K.H. (1987) J. Biol. Chem. 262, 
8826-8833. 

Wimberly, B.T., Brodersen, D.E., Clemons, W.M., Jr., 
Morgan-Warren, R.J., Carter, A.P., Vonrhein, C., Hartsch, 
T., & Ramakrishnan, V. (2000) Nature 407, 327-339. 
Morrison, C.A., Garrett, R.A., & Bradbury, E.M. (1977) 
Eur. J. Biochem. 78, 153-159. 

Rohde, M.F., O’Brien, S., Cooper, S., & Aune, K.C. 
(1975) Biochemistry 14, 1079-1087. 

Franz, A., Georgalis, Y., & Giri, L. (1979) Biochim. 
Biophys. Acta 578, 365-371. 

Doolittle, R.F. (1984) Annu. Rev. Biochem. 53, 195-229. 
Williams, R.C. (1981) J. Mol. Biol. 150, 399-408. 

Yang, Z., Kollman, J.M., Pandi, L., & Doolittle, R.F. 
(2001) Biochemistry 40, 12515-12523. 

Brown, J.H., Volkmann, N., Jun, G., Henschen-Edman, 
A.H., & Cohen, C. (2000) Proc. Natl. Acad. Sci. U.S.A. 97, 
85-90. 

Laudano, A.P., & Doolittle, R.F. (1980) Biochemistry 19, 
1013-1019. 

Laudano, A.P., & Doolittle, R.F. (1978) Proc. Natl. Acad. 
Sci. U.S.A. 75, 3085-3089. 

Spraggon, G., Everse, S.J., & Doolittle, R.F. (1997) 
Nature 389, 455-462. 

Everse, S.J., Spraggon, G., Veerapandian, L., Riley, M., 
& Doolittle, R.F. (1998) Biochemistry 37, 8637-8642. 
Madrazo, J., Brown, J.H., Litvinovich, S., Dominguez, 
R., Yakovlev, S., Medved, L., & Cohen, C. (2001) Proc. 
Natl. Acad. Sci. U.S.A. 98, 11967-11972. 

Hantgan, R.R., & Hermans, J. (1979) J. Biol. Chem. 254, 
11272-11281. 

Chen, R., & Doolittle, R.F. (1971) Biochemistry 10, 
4487-4491. 

Doolittle, R.F., Cassman, K.G., Cottrell, B.A., & Friezner, 
S.J. (1977) Biochemistry 16, 1715-1719. 

Amos, L., & Klug, A. (1974) J. Cell Sci. 14, 523-549. 
Wade, R.H., Chretien, D., & Job, D. (1990) J. Mol. Biol. 
212, 775-786. 

Kirschner, M.W., Williams, R.C., Weingarten, M., & 
Gerhart, J.C. (1974) Proc. Natl. Acad. Sci. U.S.A. 71, 
1159-1163. 


. Melki, R., Carlier, 


References 741 


. Feit, H., Slusarek, L., & Shelanski, M.L. (1971) Proc. 


Natl. Acad. Sci. U.S.A. 68, 2028-2031. 


. Ludueana, R.F., Shooter, E.M., & Wilson, L. (1977) J. 


Biol. Chem. 252, 7006-7014. 


. Ponstingl, H., Krauhs, E., Little, M., & Kempf, T. (1981) 


Proc. Natl. Acad. Sci. U.S.A. 78, 2757-2761. 


. Krauhs, E., Little, M., Kempf, T., Hofer-Warbinek, R., 


Ade, W., & Ponstingl, H. (1981) Proc. Natl. Acad. Sci. 
U.S.A. 78, 4156-4160. 


. Lowe, J., Li, H., Downing, K.H., & Nogales, E. (2001) J. 


Mol. Biol. 313, 1045-1057. 


. Nogales, E., Wolf, S.G., & Downing, K.H. (1998) Nature 


391, 199-203. 


. Nettles, J.H., Li, H., Cornett, B., Krahn, J.M., Snyder, 


J.P., & Downing, K.H. (2004) Science 305, 866-869. 


. Bergen, L.G., & Borisy, G.G. (1980) J. Cell Biol. 84, 


141-150. 


. Mandelkow, E.M., Schultheiss, R., Rapp, R., Muller, M., 


& Mandelkow, E. 
1073. 


(1986) J. Cell Biol. 102, 1067- 


. Song, Y.H., & Mandelkow, E. (1993) Proc. Natl. Acad. 


Sci. U.S.A. 90, 1671-1675. 


. Kikkawa, M., Ishikawa, T., Nakata, T., Wakabayashi, T., 


& Hirokawa, N. (1994) J. Cell Biol. 127, 1965-1971. 


. Johnson, K.A., & Borisy, G.G. (1977) J. Mol. Biol. 117, 


1-31. 


. Kirschner, M.W., Honig, L.S., & Williams, R.C. (1975) J. 


Mol. Biol. 99, 263-276. 


. Scheele, R.B., & Borisy, G.G. (1978) J. Biol. Chem. 253, 


2846-2851. 


. Fygenson, D.K., Braun, E., & Libchaber, A. (1994) Phys. 


Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. 
Top. 50, 1579-1588. 


. Caudron, N., Valiron, O., Usson, Y., Valiron, P., & Job, 


D. (2000) J. Mol. Biol. 297, 211-220. 


. Mitchison, T., & Kirschner, M. (1984) Nature 312, 


232-237. 


. Moritz, M., Braunfeld, M.B., Sedat, J.W., Alberts, B., & 


Agard, D.A. (1995) Nature 378, 638-640. 


. Bergen, L.G., Kuriyama, R., & Borisy, G.G. (1980) J. Cell 


Biol. 84, 151-159. 


. Koshland, D.E., Mitchison, T.J., & Kirschner, M.W. 


(1988) Nature 331, 499-504. 


. Mitchison, T., & Kirschner, M. (1984) Nature 312, 


237-242. 


. Huecas, S., & Andreu, J.M. (2004) FEBS Lett. 569, 43-48. 
. Mitchison, T.J. (1993) Science 261, 1044-1047. 
. Hirose, K., Fan, J., & Amos, L.A. (1995) J. Mol. Biol. 251, 


329-333. 


. Fan, J., Griffiths, A.D., Lockhart, A., Cross, R.A., & Amos, 


L.A. (1996) J. Mol. Biol. 259, 325-330. 


. Weisenberg, R.C. (1972) Science 177, 1104-1105. 
. Weisenberg, R.C., Borisy, G.G., & Taylor, E.W. (1968) 


Biochemistry 7, 4466-4479. 


. Desai, A., & Mitchison, T.J. (1998) Bioessays 20, 


523-527. 


. Weisenberg, R.C., Deery, W.J., & Dickinson, P.J. (1976) 


Biochemistry 15, 4248-4254. 
M.F., & Pantaloni, D. 
Biochemistry 29, 8921-8932. 


(1990) 


. Margolis, R.L. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 


1586-1590. 


742 


525. 


526. 


527. 


528. 


529. 


530. 


531. 


532. 


533. 


534. 


535. 


536. 


537. 


538. 


539. 


540. 


541. 
542. 


543. 


544. 


545. 


546. 


547. 


548. 


549. 


550. 


551. 


552. 


553. 


554. 


555. 


Folding and Assembly 


Karr, T.L., Podrasky, A.E., & Purich, D.L. (1979) Proc. 
Natl. Acad. Sci. U.S.A. 76, 5475-5479. 

Jameson, L., & Caplow, M. (1980) J. Biol. Chem. 255, 
2284-2292. 

Vandecandelaere, A., Martin, S.R., & Bayley, P.M. 
(1995)Biochemistry 34, 1332-1343. 

Hyman, A.A., Salser, S., Drechsel, D.N., Unwin, N., & 
Mitchison, T.J. (1992) Mol. Biol. Cell 3, 1155-1167. 
Karr, T.L., Kristofferson, D., & Purich, D.L. (1980) J. Biol. 
Chem. 255, 8560-8566. 

Dye, R.B., & Williams, R.C., Jr. (1996) Biochemistry 35, 
14331-14339. 

Caplow, M., & Shanks, J. (1995) Biochemistry 34, 
15732-15741. 

Margolis, R.L., & Wilson, L. (1977) Proc. Natl. Acad. Sci. 
U.S.A. 74, 3466-3470. 

Panda, D., Daijo, J.E., Jordan, M.A., & Wilson, L. (1995) 
Biochemistry 34, 9921-9929. 

Chretien, D., Fuller, S.D., & Karsenti, E. (1995) J. Cell 
Biol. 129, 1311-1328. 

Woodrum, D.T., Rich, S.A., & Pollard, T.D. (1975) J. Cell 
Biol. 67, 231-237. 

Pollard, T.D., & Mooseker, M.S. (1981) J. Cell Biol. 88, 
654-659. 

Straub, F.B., & Feuer, G. (1950) Biochim. Biophys. Acta 
4, 455—470. 

Pieper, U., & Wegner, A. (1996) Biochemistry 35, 
4396-4402. 

Carlier, M.F., Pantaloni, D., & Korn, E.D. (1984) J. Biol. 
Chem. 259, 3983-9986. 

Lal, A.A., Brenner, S.L., & Korn, E.D. (1984) J. Biol. 
Chem. 259, 13061-13065. 

Pollard, T.D. (1984) J. Cell Biol. 99, 769-777. 

Weber, A., Northrop, J., Bishop, M.F., Ferrone, F.A., & 
Mooseker, M.S. (1987) Biochemistry 26, 2537-2544. 
Otterbein, L.R., Graceffa, P., & Dominguez, R. (2001) 
Science 293, 708-711. 

Walsh, T.P., Weber, A., Higgins, J., Bonder, E.M., & 
Mooseker, M.S. (1984) Biochemistry 23, 2613-2621. 
Robinson, R.C., Turbedsky, K., Kaiser, D.A., Marchand, 
J.B., Higgs, H.N., Choe, S., & Pollard, T.D. (2001) Science 
294, 1679-1684. 

Mooseker, M.S., Graves, T.A., Wharton, K.A., Falco, N., 
& Howe, C.L. (1980) J. Cell Biol. 87, 809-822. 

Glenney, J.R., Jr., Kaulfus, P., & Weber, K. (1981) Cell 24, 
471-480. 

Weber, A., Northrop, J., Bishop, M.F., Ferrone, F.A., & 
Mooseker, M.S. (1987) Biochemistry 26, 2528-2536. 
Hasegawa, T., Takahashi, S., Hayashi, H., & Hatano, S. 
(1980) Biochemistry 19, 2677-2683. 

Bryan, J., & Coluccio, L.M. (1985) J. Cell Biol. 101, 
1236-1244. 

Caldwell, J.E., Heiss, S.G., Mermall, V., & Cooper, J.A. 
(1989) Biochemistry 28, 8506-8514. 

Selden, L.A., Kinosian, H.J., Estes, J.E., & Gershman, 
L.C. (2000) Biochemistry 39, 64-74. 

Gibbons, I.R., & Fronk, E. (1979) J. Biol. Chem. 254, 
187-196. 

Paschal, B.M., Shpetner, H.S., & Vallee, R.B. (1987) J. 
Cell Biol. 105, 1273-1282. 
Paschal, B.M., & Vallee, R.B. 
181-183. 


(1987) Nature 330, 


556. 


557. 


558. 


559. 


560. 


561. 


562. 


563. 


564. 


565. 


566. 


567. 


568. 
569. 


570. 
571. 


572. 


573. 
574, 


575. 
576. 


577. 
578. 
579. 
580. 
581. 
582. 
583. 
584. 
585. 
586. 


587. 
588. 


589. 
590. 
591. 


Vale, R.D., Reese, T.S., & Sheetz, M.P. (1985) Cell 42, 
39-50. 

Yin, H.L., Hartwig, J.H., Maruyama, K., & Stossel, T.P. 
(1981) J. Biol. Chem. 256, 9693-9697. 

Glenney, J.R., Jr., Kaulfus, P., Matsudaira, P., & Weber, 
K. (1981)J. Biol. Chem. 256, 9283-9288. 

Bretscher, A., & Weber, K. (1979) Proc. Natl. Acad. Sci. 
U.S.A. 76, 2321-2325. 

Yonezawa, N., Nishida, E., Iida, K., Yahara, I., & Sakai, 
H. (1990) J. Biol. Chem. 265, 8382-8386. 

Safer, D., Elzinga, M., & Nachmias, V.T. (1991) J. Biol. 
Chem. 266, 4029-4032. 

Isenberg, G., Aebi, U., & Pollard, T.D. (1980) Nature 288, 
455-459. 

Kilimann, M.W., & Isenberg, G. (1982) EMBO J. 1, 
889-894. 

Casella, J.F., Maack, D.J., & Lin, S. (1986) J. Biol. Chem. 
261, 10915-10921. 

Kuhlman, P.A., & Fowler, V.M. (1997) Biochemistry 36, 
13461-13472. 

Maun, N.A., Speicher, D.W., DiNubile, MJ., & 
Southwick, F.S. (1996) Biochemistry 35, 3518-3524. 
Casella, J.F., Craig, S.W., Maack, D.J., & Brown, A.E. 
(1987) J. Cell Biol. 105, 371-379. 

Wang, K., & Wright, J. (1988) J. Cell Biol. 107, 2199-2212. 
Kruger, M., Wright, J., & Wang, K. (1991) J. Cell Biol. 115, 
97-107. 

Labeit, S., & Kolmerer, B. (1995) J. Mol. Biol. 248, 308-315. 
Jin, J.P., & Wang, K. (1991) J. Biol. Chem. 266, 
21215-21223. 

Geiger, B., Tokuyasu, K.T., Dutton, A.H., & Singer, S.J. 
(1980) Proc. Natl. Acad. Sci. U.S.A. 77, 4127—4131. 
Small, J.V. (1985) EMBO J. 4, 45—49. 

Gregorio, C.C., Weber, A., Bondad, M., Pennise, C.R., & 
Fowler, V.M. (1995) Nature 377, 83-86. 

Elliott, A., & Offer, G. (1978) J. Mol. Biol. 123, 505-519. 
Slayter, H.S., & Lowey, S. (1967) Proc. Natl. Acad. Sci. 
U.S.A. 58, 1611-1618. 

Vibert, P., & Craig, R. (1983) J. Mol. Biol. 165, 303-320. 
Huxley, H.E. (1969) Science 164, 1356-1365. 

Huxley, H.E. (1963) J. Mol. Biol. 7, 281-308. 

Knight, P.J., Erickson, M.A., Rodgers, M.E., Beer, M., & 
Wiggins, J.W. (1986) J. Mol. Biol. 189, 167-177. 

Craig, R., Padron, R., & Alamo, L. (1991) J. Mol. Biol. 220, 
125-132. 

Morimoto, K., & Harrington, W.F. (1974) J. Mol. Biol. 83, 
83-97. 

Hayashi, T., Silver, R.B., Ip, W., Cayer, M.L., & Smith, D.S. 
(1977) J. Mol. Biol. 111, 159-171. 

Morimoto, K., & Harrington, W.F. (1973) J. Mol. Biol. 77, 
165-175. 

Sinard, J.H., & Pollard, T.D. (1990) J. Biol. Chem. 265, 
3654-3660. 

Katsura, I., & Noda, H. (1971) J. Biochem. (Tokyo) 69, 
219-229. 

Davis, J.S. (1985) Biochemistry 24, 5263-5269. 

Osapay, K., Theriault, Y., Wright, P.E., & Case, D.A. 
(1994) J. Mol. Biol. 244, 183-197. 

Lowe, J., & Amos, L.A. (1998) Nature 391, 203-206. 
Erickson, H.P. (1995) Cell 80, 367-370. 

Mukherjee, A., & Lutkenhaus, J. (1994) J. Bacteriol. 176, 
2754-2758. 


Chapter 14 


Membranes 


Implicit in the cellular theory is the existence of a bound- 
ary between the cytoplasm of a cell and its surroundings, 
be they seawater or the extracellular fluid in a highly 
organized tissue. The boundary is a defined physical 
structure known as the plasmamembrane, and itis a thin, 
continuous, closed bag marking the boundary of the cell. 
In electron micrographs of thin sections through a cell, 
the plasma membrane appears as a continuous closed 
curve that designates the perimeter of the cytoplasm. 

In many microorganisms, such as algae, fungi, and 
bacteria, the plasma membrane is surrounded on its 
outer surface by an outer membrane or a cell wall. The 
cells of higher plants are also surrounded by a thick cell 
wall. Usually, the cells of animals, when they are located 
in organized tissues, are surrounded by networks of col- 
lagen and mucopolysaccharide. All of these integuments 
encasing these various cells are tough polymeric materi- 
als that provide support and security for the plasma 
membrane, which is the formal boundary between the 
cytoplasm and the environment, between the living and 
the inert. 

When a thin section of a eukaryotic cell is examined 
in the electron microscope, the most striking feature of 
the image is the collection of closed curves that represent 
systems of intracellular membranes cut in cross section. 
These intracellular membranes are the endoplasmic 
reticulum, the Golgi membranes, and the membranes of 
the mitochondria, the nucleus, the lysosomes, the perox- 
isomes, the chloroplasts, the endosomes, and the vac- 
uoles of the cell. Each of these structures at any instant is 
a closed, continuous, often highly irregular bag enclosing 
its respective volume of fluid, which is isolated by its 
membrane from the cytoplasm of the cell. The aqueous 
solution of proteins found within any one of these bags is 
unique from that in the cytoplasm surrounding it and is 
characteristic of the particular organelle. The mem- 
branes creating each of these organelles have the same 
structure as the plasma membrane, although each is dis- 
tinct in its chemical composition. Because almost every 
membrane in the cell separates the cytoplasm from 
another space, the two sides of a membrane are defined 
relative to the cytoplasm, and they are referred to as cyto- 
plasmic and extracytoplasmic, respectively. Among the 
more interesting ambiguities in this situation are the 
inner membranes of the mitochondria and the thy- 
lakoids of chloroplasts, which are the descendants of 
smaller cells that were incorporated into larger cells. 


Consequently, their extracytoplasmic volumes were at 
one time cytoplasm, of which vestiges remain. 

Each of the membranes composing a cell can be 
purified from the homogenate of a eukaryotic tissue by 
cell fractionation.’ Originally these purifications were 
performed in water,' but it was subsequently noted that 
the organelles retained their appearance more success- 
fully in concentrated solutions of sucrose.” Tissues are 
usually homogenized in a solution of 0.25 M sucrose, and 
the membranes are isolated by centrifugation on gradi- 
ents composed of solutions of increasing concentrations 
of sucrose, which is a solute that stabilizes proteins by 
salting in, or of other solutes that change the density or 
osmolarity of the solution. During homogenization, the 
mitochondria, lysosomes, peroxisomes, chloroplasts, 
nuclei, and Golgi membranes remain intact and can be 
identified by their characteristic morphology.** Plasma 
membranes and endoplasmic reticulum are disinte- 
grated by the homogenization and become rounded 
fragments, often goblet-shaped or vesicular in morphol- 
ogy, and these fragments are known as microsomes." 
Microsomes of rough endoplasmic reticulum are readily 
identified by their adherent ribosomes.” 

The various membranes and intact organelles sus- 
pended in the homogenate of a eukaryotic cell differ in 
their size and shape, their ratio of protein, lipid, and car- 
bohydrate, and their composition of fixed acid-bases. 
Therefore, they can be separated from each other by dif- 
ferences in their sedimentation coefficients and their 
buoyant densities (Figure 14-1)° or their net charges at a 
particular pH. Homogenates are often submitted to 
sequential centrifugations at different centrifugal forces 
for different durations. Such differential centrifuga- 
tions’ separate only crudely on the basis of sedimenta- 
tion coefficient because large differences in 
sedimentation coefficient between any two organelles 
are necessary if one of the organelles is to form a pellet 
exclusively at one centrifugal force and duration while 
the other forms a pellet exclusively at a higher force or 
longer duration. Rate sedimentation’ is a technique in 
which a narrow band of sample is layered onto a gradient 
that changes only gradually in sucrose concentration, 
and the components are separated owing to their differ- 
ences in sedimentation coefficient as they move through 
the gradient under the influence of a centrifugal field. 
Rate sedimentation provides much higher resolution 
than differential centrifugation. 
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Figure 14-1: Buoyant density and sedimentation coefficient of the 
organelles and fragments of membrane found in a homogenate of 
a eukaryotic cell. Each of the boundaries surrounds the distribu- 
tion of buoyant densities (grams centimeter“) and sedimentation 
coefficients of the particular organelle or fragment of membrane. 
Uniform organelles such as mitochondria, lysosomes, nuclei, and 
peroxisomes have relatively tight distributions of buoyant density 
and sedimentation coefficient. Smooth endoplasmic reticulum 
and plasma membrane become microsomes upon homogeniza- 
tion. Microsomes and rough endoplasmic reticulum, because they 
are heterogeneous fragments of membrane broken from much 
larger continuous structures, have fairly uniform buoyant densities 
but a wide range of sedimentation coefficients. The diagram illus- 
trates that each of these classes of particles occupies a unique 
region in the two-dimensional space, and this allows each class of 
membranes to be isolated by a method exploiting differences in 
sedimentation coefficient in combination with a method exploit- 
ing differences in buoyant density. Usually, the boundaries for 
microsomes of plasma membrane and smooth endoplasmic retic- 
ulum coincide, so these two types of membrane fragments are dif- 
ficult to separate. Adapted with permission from ref 6. Copyright 
1974 Academic Press. 


Unfortunately, the population of a given organelle 
in a tissue usually has a significant variation (Figure 
14-1)° in its sedimentation coefficient, and additional 
steps that separate membranes by other independent 
properties are often required for complete purification. 
During isopycnic centrifugation’ the sample is layered 
onto a much steeper gradient of density formed usually 
either by varying sucrose concentration or by varying 
Ficoll concentration.® Ficoll is a polysaccharide that does 
not have the high osmolarity of sucrose. The gradients are 
submitted to centrifugation until all of the components 
have traveled to their respective buoyant densities, at 
which point they cease to move. The various membranes 
suspended in a homogenate can also be separated on the 
basis of their charge by free-flow electrophoresis.’ In 
certain instances, where the fixed anionic functional 
groups on a particular type of membrane are properly 
oriented, these membranes can be precipitated exclu- 
sively with divalent cations such as magnesium or cal- 
cium.” Such a precipitation has been used to separate 
microsomes derived from endoplasmic reticulum from 
microsomes derived from plasma membrane.” 


During purification, the various classes of mem- 
branes can be followed by assaying for particular 
marker enzymes. Each type of organelle has at least 
one enzymatic activity that is almost exclusively con- 
fined to it and can be used as a measure of its concen- 
tration.”*'>? The ability of certain membranes to bind 
very specific ligands has also been used to follow their 
purification. For example, the plasma membranes of 
animal cells bind the protein wheat germ agglutinin, 
the peptide hormone insulin, and the toxin from Vibrio 
cholerae with high specificity; the binding of any one of 
these three ligands can be used to identify plasma 
membranes.'* The final identification of the purified 
suspension of membranes as the organelle of interest, 
however, must always be made by examining thin sec- 
tions of pellets of the purified material by electron 
microscopy. 

All of these procedures have been used to develop 
methods for the purification of mitochondria,’ peroxi- 
somes,’ lysosomes,’ Golgi membranes,” rough endo- 
plasmic reticulum,” and chloroplasts.'° Ironically, it is 
the plasma membrane of most cells that is the most dif- 
ficult membrane to purify because upon homogeniza- 
tion it fragments and becomes very similar to the much 
more abundant fragments of smooth endoplasmic retic- 
ulum. For this reason, the plasma membrane of the ery- 
throcyte, a cell lacking any other membranes, has often 
been used as a model for an animal plasma membrane. 
Plasma membranes, however, have been purified from a 
number of animal tissues, including liver,” kidney,’ adi- 
pose tissue,” and brain,’® and from cells grown in tissue 
culture, such as murine fibroblasts (L-cells),!? A431 
cells," and HeLa cells.” Plasma membranes from 
fungi” or bacteria, such as Escherichia coli,” are pre- 
pared from spheroplasts, which are individual cells that 
have been enzymatically stripped of their outer mem- 
branes or cell walls. The spheroplasts are ruptured and 
the smooth plasma membranes are isolated from the 
homogenate. 

Any of the membranes comprising an organelle or 
derived from a larger cellular structure can be freed from 
the soluble proteins it encloses by lysis and sedimenta- 
tion. Such a membrane is constituted from lipids, carbo- 
hydrate, and proteins. All of the carbohydrate is 
covalently attached to protein in the form of glycoprotein 
or to lipid in the form of glycolipid. The component 
lipids, glycolipids, proteins, and glycoproteins are both 
heterogeneous mixtures, and the fraction of the mass 
that is protein can vary up to 75%. The basic structure 
upon which biological membranes are based is a bilayer 
of amphipathic lipids in which some neutral lipid is dis- 
solved. 
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The basic structural element of a biological membrane is 
a bilayer of phospholipids (Figure 14-2). Bilayers can be 
formed from a wide variety of lipids, but the plasma 
membrane of the typical eukaryotic cell is formed from 
phospholipids and cholesterol. The most prevalent 
phospholipids are phosphatidylcholine (Figure 14-2A) 
and phosphatidylethanolamine (Figure 14-2B). Phos- 
phatidylcholine and phosphatidylethanolamine are 
glycerophospholipids constructed from a molecule of 
sn-glycerol 3-phosphate 


" "opo,2- 
HO S 


14-1 


in which carbon2 has the R configuration. Either 
trimethylethanolammonium (choline; Figure 14-2A) or 
ethanolamine (Figure 14-2B), respectively, is esterified 
to the phosphate, forming a phosphate diester. The two 
remaining hydroxyls of the glycerol 3-phosphate are acy- 
lated with a pair of fatty acids. 

Phosphatidylcholine and phosphatidylethano- 
lamine are amphipathic lipids. An amphipathic lipid is 
an elongated molecule, usually of biological origin, that 
is composed of a substantial portion of unadulterated 
hydrocarbon at one of its ends and a portion of 
hydrophilic functional groups at its other end. One end 
of each molecule of phosphatidylcholine or phos- 
phatidylethanolamine, the end containing the phos- 
phate diester and the choline or ethanolamine, is 
hydrophilic. The other end, the end containing the linear 
hydrocarbon of the fatty acids, is hydrophobic. Linear 
hydrocarbons are the most hydrophobic functional 
groups among biological molecules (Table 5-9).” 

Most glycerophospholipids have two fatty acids 
attached to the glycerol in ester linkages. The fatty 
acids found in naturally occurring phospholipids seem 
to be chosen almost at random from the mixture of fatty 
S-acylcoenzymes A produced by the particular organ- 
ism in which the membrane is located. Saturated fatty 
acids are esterified mainly to carbon 1 of the sn-glycerol 
3-phosphate, and unsaturated fatty acids are esterified 
mainly to carbon 2 (Figure 14-2), so that the naturally 
occurring phospholipids end up with a roughly equal 
ratio of saturated and unsaturated hydrocarbons, 
inescapably intermixed. 

The saturated fatty acids are mainly linear car- 
boxylic acids that vary in length from 12 to 24 carbons. 
The most frequently encountered saturated fatty acids in 
biological membranes are palmitic acid (n-hexadecanoic 
acid; Figure 14-2C,F) and stearic acid (n-octadecanoic 
acid; Figure 14-2B). The most stable conformation of a 
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linear hydrocarbon is all-trans (14-2), but the introduc- 
tion of a gauche (14-3) conformation at one of the 
carbon-carbon bonds requires only about 5 kJ mol” if 
the hydrocarbon is unhindered. Therefore, at 25°C 
about 10% of the unhindered carbon-carbon single 
bonds in a saturated fatty acid should be gauche. The 
gauche conformation transiently introduces an elbow 
with an angle of about 109° into the chain: 


14-2 


14-3 


(14-1) 


A second gauche conformation can reorient the chain to 
its original direction. 

Unsaturated fatty acids contain one or more 
carbon-carbon double bonds. A trans double bond 
would not put an elbow in the hydrocarbon, but almost 
every carbon-carbon double bond in naturally occurring 
fatty acids is cis. One cis double bond introduces a per- 
manent elbow with an angle of about 120-130° into an 
otherwise unsaturated linear hydrocarbon: 


14-4 


In the most stable conformation, one of the hydrogens 
on one of the two methylenes adjacent to the double 
bond fits between the two hydrogens of the other. The 
most common unsaturated fatty acids with only one 
carbon-carbon double bond are palmitoleic acid (cis- 
hexadec-9-enoic acid; Figure 14-2F) and oleic acid (cis- 


Figure 14-2: Representatives of the types of amphipathic lipids found in the bilayers of biological membranes, drawn as if they were in a 
bilayer. (A) 1-Lignoceroyl-2-oleoylphosphatidylcholine. (B) 1-Stearoyl-2-a-linolenoylphosphatidylethanolamine. (C) 1-Palmitoyl- 
2-linoleoylphosphatidylserine. (D) 1-Arachidoyl-2-arachidonoylphosphatidic acid. (E) 1-Octadec-1’-enyl-2-oleoylglycero-3-phosphocholine 
(a plasmalogen). (F) 1-Palmitoyl-2-palmitoleoylphosphatidylinositol. (G) N-a-Linolenoylsphingosine-1-phosphocholine (a sphingomyelin). 
(H) 1-Hexadec-1’-enyl-2-a-linolenoyl-3-phosphoethanolamine (a plasmalogen). 


octadec-9-enoic acid; Figure 14-2A,E), and the latter pre- 
dominates. 

When an unsaturated fatty acid contains two or 
three double bonds, they are spaced three carbons apart 
with a saturated carbon between them. This prevents the 
double bonds from conjugating with each other and 
becoming a rigid planar structure. Two cis double bonds 
spaced in this way produce a sinuous curve in the alkyl 
chain but do not change its ultimate direction: 


H 


They do, however, shorten its ultimate length by the 
equivalent of two carbon atoms, and the volume lost at 
the end is expressed as a bulge at the location of the 
unsaturation. The most common unsaturated fatty acid 
with two double bonds is linoleic acid (cis,cis-octadeca- 
9,12-dienoic acid; Figure 14-2C). Three cis double bonds 
produce an even longer sinuous curve that does place a 
permanent elbow in the alkyl chain: 


14-6 


The most common polyunsaturated fatty acid with 
three carbon-carbon double bonds is linolenic acid in its 
two geometric isomers, q-linolenic acid (cis,cis,cis- 
octadeca-9,12,15-trienoic acid; Figure 14-2B,G,H) and 
y-linolenic acid (cis,cis,cis-octadeca-6,9,12-trienoic acid). 
Arachidonic acid (cis,cis,cis,cis-icosa-5,8,11,14-tetra- 
enoic acid; Figure 14-2D) is a less common polyun- 
saturated fatty acid that serves as the precursor to 
prostaglandins, and its frequency is regulated for that 
purpose. Almost all of the unsaturation commences at 
carbon 9 in the usual naturally occurring mixture of the 
various fatty acids, so the portion of the hydrocarbon 
closest to the glyceryl group in a phospholipid is fully sat- 
urated and that farthest away contains unsaturated posi- 
tions and is more geometrically disordered (Figure 14-2). 
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The head groups are the polar alcohols esterified to 
the phosphoric acid in naturally occurring phospho- 
lipids. The majority of the head groups are based on 
ethanolamine: 


In phosphatidylethanolamine (Figure 14-2B), the 
ethanolamine is unaltered. In phosphatidylcholine 
(Figure 14-2A), the ethanolamine is triply methylated on 
nitrogen. In phosphatidylserine (Figure 14-2C), the 
ethanolamine is carboxylated. Two glycerophospho- 
lipids not based on ethanolamine are phosphatidic acid 
(Figure 14-2D), which lacks a second ester on the phos- 
phoric acid, and phosphatidylinositol, in which the 
alcohol is myo-inositol (Figure 14-2F). Phosphatidy- 
linositol is aminor phospholipid present at less than 5% 
in membranes. It provides the covalently attached 
anchor for phosphatidylinositol-linked proteins (Figure 
3-17). Because it is also an intermediate in the biosyn- 
thesis of the second messenger myo-inositol 1,4,5-tri- 
phosphate, its levels are independently regulated. 

There are several other phospholipids that are vari- 
ations on the theme developed by the glycerophospho- 
lipids. The plasmalogens (Figure 14-2E,H) have an enol 
ether at carbon 1 of the sn-glycerol 3-phosphate rather 
than an acylated oxygen. Upon treatment with BF; in 
methanol, the enol ether is released as the dimethyl 
acetal of a fatty aldehyde.” The enol ethers are usually 
derived from n-hexadecanal (Figure 14-2H) or n-octa- 
decanal (Figure 14-2E), but minor amounts of the deriv- 
atives of unsaturated fatty aldehydes are also present.” 
Either ethanolamine (Figure 14-2H) or choline (Figure 
14-2E) is esterified, respectively, to the sn-glycerol 
3-phosphate of plasmalogens. 

There are a number of phospholipids and glyco- 
lipids in the membranes of archaebacteria that have 
tetraisopranyl alcohols, such as 3,7,11,15-tetramethyl- 
hexadecanol, in ether linkage to two of the oxygens of 
glycerol.” The head group attached to the third oxygen 
can be a monosaccharide, such as glucose, in direct 
acetal linkage; a disaccharide, such as glucosyl (81-6) 
glucose, in acetal linkage; or a 1-phospho-myo-inositol, a 
phosphoserine, or a phosphoethanolamine in a phos- 
phodiester Dnkage "PT One of these glycolipids and one 
of these phosphoinositides can also be fused at both of 
the terminal methyls of their tetraisopranyl alkanes to 
produce a two-headed phosphoglycolipid:”’ 
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At its two ends 14-8 is a paradigm for the types of head 
groups in archaebacterial isopranylether lipids. In the 
membranes of the protist Leishmania donovani there is 
a different type of glycophosphoetherlipid in which 
either tetracosanol or hexacosanol forms an alkylether at 
carbon 1 of glycerol, the 2-hydroxy group of the glycerol 
is unsubstituted, and the head group on the 3-hydroxy 
group is a 1-phospho-myo-inositol to which a branched 
heptasaccharide is attached, and a (-6Galßl-4Man- 
al-phosphate-)|; is attached to the heptasaccharide.””” 

A sphingomyelin has a primary alkene of 15 car- 
bons replacing one of the hydrogens on carbon 1 of 
sn-glycerol 3-phosphate, the oxygen on carbon 1 is not 
acylated, and a nitrogen replaces the oxygen on carbon 2 
and forms an amide rather than an ester with a fatty acid 
(Figure 14-2G). Upon saponification, sphingosine 
[(2S,3.R)2-amino-3-hydroxyoctadec-4-en-1-ol], the fun- 
damental skeleton on which sphingomyelin is con- 
structed, is released. Sphingomyelins have choline for 
their head group. 

Glycosphingolipids are also derived from sphingo- 
sine. As in sphingomyelin, the amino group on carbon 2 
of the sphingosine in a glycosphingolipid is acylated. An 
acylated, unglycosylated sphingosine is a ceramide. A 
cerebroside is a glycosphingolipid in which either a glu- 
cose or a galactose is attached to the 1-hydroxyl of the 
ceramide in acetal linkage. Glycosphingolipids, however, 
can also have oligosaccharides attached to the 1-hydroxyl 
of the ceramide. Depending on the sequence of the imme- 
diately attached core oligosaccharide, the resulting gly- 
cosphingolipid is a ganglioside, a globoside, a lactoside, 
or of some other name (Table 14-1). As in the oligosac- 
charides on glycoproteins, the core ofthe oligosaccharide 
defining each type of glycosphingolipid can be incom- 
pletely finished, but the core oligosaccharides are usually 
further elaborated by adding N-acetylgalactosamine, 
galactoses, and as many as four or five sialic acids, and 
these modifications produce a dizzying array of micro- 
heterogeneous oligosaccharides more complex than that 
ofthe oligosaccharides on glycoproteins. 

The composition (Table 14-2) of the lipids in plasma 
membranes from human erythrocytes” or from murine 
fibroblasts (L-cells) grown in tissue culture!” are typical 
of plasma membranes from animal cells. The distribu- 
tion of fatty acids among the various phospholipids from 


Table 14-1: Oligosaccharides Forming the Core of the 
Various Glycosphingolipids”" 


type sequence“ 


galaside Gal(a1,4)Gal-ceramide 


schistoside GalNAc(ß1,4)Glc-ceramide 
molluside Man(a1,3)Man(ß1,4)Glc-ceramide 
arthroside GlcNAc(ß1,3)Man(ß1,4)Glc-ceramide 
mucoside Gal(ß1,3)Gal(ß1,4)Gal(ß1,4)Glc-ceramide 
ganglioside Gal(ß1,3)GalNAc(ß1,4)Gal(ß1,4)Glc-ceramide 
globoside GalNAc(ß1,3)Gal(«@1,9)Gal(ß1,4)Glc-ceramide 
isogloboside GalNAc(1,3)Gal(@1,3)Gal(B1,4) Glc-ceramide 
lactoside Gal(B1,3)GlcNAc(f1,3) Gal(61,4) Glc-ceramide 
neolactoside Gal(ß1,4)GleNAc(ß1,3)Gal(ß1,4)Glc-ceramide 


“These sequences form the core of the oligosaccharide that is attached to the 
ceramide. Other monosaccharides, in particular sialic acids, are attached to these 
cores. 


Table 14-2: Composition of Amphipathic Lipids in 
Plasma Membranes from Human Erythrocytes” and 
Murine Fibroblasts (L-Cells)'” 


percentage of total lipid 


erythrocytes L-cells 
amphipathic lipid 
phosphatidylcholine 16 23 
sphingomyelin 16 14 
phosphatidylethanolamine 16 9 
phosphatidic acid 2 9 
phosphatidylserine 8 3 
phosphatidylinositol 2 3 
choline plasmalogen 2 3 
ethanolamine plasmalogen NR“ 2 
ganglioside 4 NR 
neutral lipid 
cholesterol 27 20 
triglyceride 0 13 


“Not reported. 


the plasma membranes of murine fibroblasts (Table 14-3) 
illustrates the heterogeneity of the collection. Mem- 
branes from fungi, such as the yeast Saccharomyces cere- 
visiae, have a similar ratio of phosphatidylcholine to 
phosphatidylethanolamine but greater amounts of both 
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Table 14-3: Fatty Acid Composition of Major Phospholipids and Neutral Lipid of Plasma Membranes of L-Cells'® 


composition of fatty acids in each type of lipid“ (%) 


total phosphatidyl- sphingo- phosphatidyl- phosphatidyl- phosphatidyl- phosphatidic 
fatty acids” neutral lipid serine myelin choline ethanolamine inositol acid 
14:0 4 0.5 trace 1 1 0.6 
16:0 13 8 3 31 29 24 4] 
16:1 1 1 2 8 9 5 
18:0 46 8 3 20 10 13 15 
18:1 23 1 2 5 14 16 2 
18:2 1 2 0.3 2 3 1 
18:3 4 49 60 27 25 20 27 
22:0 2 1 0.3 1 1 
20:4 4 0.4 0.2 trace trace 
24:0 3 28 30 14 11 13 8 
unsaturated fatty acids 33 53 63 34 48 48 35 
long-chain fatty acids“ 7 30 32 14 11 15 9 
polyunsaturated fatty acids 9 51 61 27 27 22 28 


“Data represent the composition of fatty acids (percent) present in total neutral lipid and the several types of phospholipid. Each of the compositions is the average of those 
from two membrane preparations. Neutral lipid and phospholipid from the plasma membranes were separated by silicic acid chromatography. Phospholipids were fur- 
ther separated by two-dimensional thin-layer chromatography, and the lipids were visualized by spraying with bromthymol blue. Iodine was not used in order to avoid 
possible losses of unsaturated fatty acids. Fatty acid methyl esters were prepared from the fatty acids in each preparation and analyzed by gas-liquid chromatography. 

Fatty acids are designated by the number of carbons and the number of double bonds. “Long-chain fatty acids are defined as fatty acids containing 20 or more carbon 


atoms. 


phosphatidylinositol and phosphatidylserine. Fungi lack 
plasmalogens and sphingolipids.” 

The composition of the lipids varies among the 
various types of membranes in an animal cell: plasma 
membrane, endoplasmic reticulum, and mitochondria.“ 
Mitochondria have less neutral lipid, sphingomyelin, 
and phosphatidylserine and more phosphatidylethanol- 
amine than plasma membranes but about the same 
amount of phosphatidylcholine and plasmalogen. 

The plasma membranes of eubacteria, such as the 
bacterium E coli,°*® are composed mainly of phos- 
phatidylethanolamine (70%) but also contain small 
amounts of phosphatidic acid and phosphatidylserine as 
well as two unusual phospholipids, phosphatidylglyc- 
erol, where a glycerol is esterified to the phosphate of 
phosphatidic acid, and diphosphatidylglycerol, where a 
single molecule of glycerol is esterified at its two ends 
with two respective phosphatidic acids. These latter two 
phospholipids are found exclusively in prokaryotes and 
mitochondria, which are the direct descendants of 
prokaryotes. The lipids of the plasma membranes from 
the bacterium Mycoplasma laidlawii, however, have 
high percentages (45%) of the glycolipids 3-[O-a-p-glu- 
copyranosylj-1,2-diacyl-sn-glycerol and 3-[O-a-p-glu- 
copyranosyl-(1,2)-O-a-D-glucopyranosyl]-1,2-diacyl- 
sn-glycerol.”” This bacterium also contains phos- 
phatidylglucose, ahomologue of phosphatidylinositol. 

In Gram-negative bacteria,* the plasma membrane 


* Gram-negative bacteria are bacteria that do not stain with a par- 
ticular dye because the dye cannot penetrate their outer mem- 
branes. 


is surrounded by an outer membrane. The outer surface 
of this outer membrane, the surface exposed to the 
hostility of the environment, is formed mostly of a 
glycolipid called lipopolysaccharide. Instead of 
glycerol, the central element of lipopolysaccharide is a 
repeating polymer, (-phosphate-4-glucosamine(ß1,6)- 
glucosamine-1-),. 3-Keto fatty acids and 3-hydroxy fatty 
acids are linked to the 3-hydroxy and the 2-amino posi- 
tions of each glucosamine.” To the glucosamines are 
also attached long oligosaccharides composed of 
mannose, glucose, galactose, N-acetylglucosamine, 
abequose, L-rhamnose, L-glycero-D-manno-heptose, 
3-deoxy-D-manno octulosonic acid, and ethanolamine.” 
The inner surface of this outer membrane, however, is 
formed from normal glycerophospholipids. 
Phosphatidylcholine, purified from a natural 
source such as eggs of Gallus gallus, spontaneously 
forms bilayers when it is suspended in water.“ Initially, 
these bilayers are gathered in sets of nested spherical 
shells known as multibilayer vesicles (Figure 14-3). 
Each of the shells in a multibilayer vesicle is a thin, 
closed, continuous bag. If such a suspension is submit- 
ted to sonication, the multibilayers eventually fragment 
and become small, unilamellar spherical vesicles.“ 
These vesicles are so small that they are not representa- 
tive of the bilayer in biological membranes because of 
their excessive curvature. Larger, sealed, spherical unil- 
amellar vesicles with uniform diameters of around 
100 nm” or with heterogeneous diameters as large as 
50 um” can also be prepared from glycerophospholipids 
suspended in aqueous solution. A flat planar membrane, 
which is a single bilayer of phospholipid, can be formed 
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across a small circular hole separating two aqueous com- 
partments.“** Oriented bilayers of phosphatidylcholine 
can also be produced by evaporating a solution of the 
phospholipid in chloroform-methanol onto a flat surface 
such as mica and hydrating it with moist helium.”° In 
each of these forms it is believed that the basic structural 
element, the bilayer of phospholipids, is the same and 
that the bilayer is a thin (4-5 nm) fluid film of phospho- 
lipid that can assume all of these different forms. 

When bilayers of phosphatidylcholine are stacked 
upon mica as flat, planar, parallel sheets and this stack is 
placed in a beam of X-radiation, it produces a diffraction 
pattern (Figure 14-4A)" that is characterized by a set of 
sharp meridional arcs and two broad symmetrical equa- 
torial reflections.“ The equatorial reflections arise from 
the diffraction of the array of the linear hydrocarbons of 
the phospholipids oriented normal to the plane of the 
specimen. The meridional arcs arise from diffraction by 
the planes stacked one upon the other parallel to the ori- 
enting surface. The diffraction pattern of the meridional 
arcs can be transformed into a distribution of electron 
density along an axis normal to the orienting surface of 
the mica (Figure 14-4B). Because the stack of flat bilayers 
is a regularly repeating structure, the recurring variation 
of electron density in this dimension produces the dif- 
fraction pattern. 

The repeating pattern in the properly phased 
Fourier transform of the meridional diffraction pattern 
consists of two regions of high electron density symmet- 
rically sandwiching a region of low electron density 
(Figure 14-4B). This sandwich is the bilayer of phos- 
phatidylcholine. The two regions with the highest elec- 
tron density on either surface of the bilayer have been 
assigned to the glycerol, the phosphate, and the choline 
of the phosphatidylcholine (Figure 14-2). These func- 
tional groups, because they contain oxygen, nitrogen, 
and, in particular, phosphorus, have high electron den- 
sity. The central region of the bilayer has been assigned 
to the hydrocarbon of the fatty acids. 


Figure 14-3: Spherical multibilayers of phosphatidyl- 
choline.“ Phosphatidylcholine was purified from the yolks of 
eggs from G. gallus by chromatography on alumina and then 
silicic acid with solvents of chloroform in methanol. The 
pure solid phosphatidylcholine was suspended in 0.15 M 
sodium chloride, and the resulting suspensions were exam- 
ined in a polarizing microscope. The structures observed are 
small hollow spheres of phospholipid. The pattern of alter- 
nating dark and light sectors around the wall of the sphere 
within the plane of the page results from the fact that each 
sphere is formed from many concentric spherical shells, 
nested each within the other. Each of these spherical shells is 
a single bilayer of phosphatidylcholine. Reprinted with per- 
mission from ref 40. Copyright 1965 Academic Press. 


A more recent calculation of the distribution of 
electron density in oriented bilayers of synthetic 
1,2-dioleoylphosphatidylcholine, based on more exten- 
sive phasing of the meridional reflections, gave the same 
profile as that in Figure 14-4B.* The analysis of the pro- 
file of electron density, however, could be extended 
significantly in these later studies, because profiles of 
scattering density for both X-radiation and neutrons 
were calculated from the diffraction of X-radiation and 
the diffraction of neutrons, respectively, by the same 
sample of 1,2-dioleoylphosphatidylcholine. The scatter- 
ing lengths for hydrogen, carbon, nitrogen, oxygen, and 
phosphorus differ dramatically relative to each other 
when these atoms are scattering X-radiation as opposed 
to when they are scattering neutrons. In particular, there 
are large differences in the relative scattering lengths for 
hydrogen. Because the different functional groups within 
a molecule of the phospholipid have different atomic 
compositions, the differences in scattering length and 
the significant differences between the profiles for the 
scattering of X-radiation and for the scattering of neu- 
trons could be used to dissect the profile of electron den- 
sity into the components that produce it. This dissection 
defines the mean locations of choline, phosphate, 
glycerol, ester, carbon-carbon double bond, and 
hydrocarbon.” These functional groups were calculated 
to be situated symmetrically at 2.2, 2.0, 1.9, 1.6, 0.8, and 
0-1.6 nm, respectively, from the center of the bilayer. 
The width of this particular synthetic bilayer from 
choline to choline is 4.4 nm. 

As the widths of the layers of water between each 
bilayer in such a stack is decreased by decreasing the 
vapor pressure of the water in the chamber (Figure 
14-4B), greater and greater decreases in vapor pressure 
are required to elicit the same change in width once the 
separation between the maxima of electron density goes 
below about 0.5 nm.” In the dissection of the map of 
electron density into its components, it was estimated 
that the choline head groups extend 0.2 nm beyond 
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Figure 14-4: Diffraction patterns (A and C) and their respective computed Fourier transforms (B and D) for multilayers of pure phos- 
phatidylcholine (A and B) from yolks of eggs from G. gallus or an equimolar mixture of the same phosphatidylcholine and cholesterol (C and 
D).“ Lipids dissolved in chloroform in methanol were smeared on mica sheets, and the solvent was evaporated under a stream of moist helium. 
The multilayers that resulted were submitted to diffraction in a beam of X-radiation (A and C). The two symmetrically displayed, diffuse but 
intense diffractions on the equators (0.46 nm) in panels A and C are from the spacings of the linear hydrocarbons of the phospholipids ori- 
ented normal to the plane of the specimen. The sharp reflections on the meridian (the vertical axis of the pattern) are the reflections arising 
from the set of planes produced by the stacking of the bilayers. With the appropriate choice of phase, the amplitudes of the meridional reflec- 
tions can be submitted to Fourier transform to obtain profiles of electron density (B and D) along an axis normal to the plane of the specimen. 
Presumably this axis is normal to the flat sheets producing the multilayer. The electron density in arbitrary units is presented as a function of 
the distance (nanometers) from the center of the bilayer. In panel B, the profile of electron density is given for lipids equilibrated with moist 
helium of 57% relative humidity (dashed lines; 14% water), or 100% relative humidity (solid line; 21% water), and in panel D, the profile of elec- 
tron density is given for lipids equilibrated with moist helium of 57% relative humidity (dashed lines; 13% water) or 100% humidity (solid line; 
22% water). It was the expectation of the investigators that the width and structure of the bilayer (within the double arrow) would remain con- 
stant while the distance between bilayers would increase as the water content increased. This expectation was used to assign the phases so its 
fulfillment is inconsequential. Adapted with permission from Nature, ref 46. Copyright 1971 Macmillan Magazines Limited. 


maxima of electron density on the two sides of the aque- each phospholipid in one of the two monolayers 
ous space.” It has been proposed that the cause of the increases”! from 0.55 to 0.68 nm”. It is thought that 
resistance to decreasing the separation between the these changes are responses to the steric effects coinci- 
bilayers is steric repulsion between these cholines on dent to the hydration of the hydrophilic functional 
the apposed surfaces.“ The observed distance at which groups on the external surfaces. As hydration increases, 
repulsion sets in is consistent with the calculated loca- it pushes apart the adjacent molecules of phosphatidyl- 
tion of the cholines. That the repulsion at these short dis- choline in each monolayer of the bilayer and produces 
tances arises from the collision of the head groups is the observed changes. Above a certain level of hydration 
supported by the fact that incorporation of cholesterol (>40% water), when all hydrophilic groups are fully 
into the bilayers, which spreads apart the head groups hydrated, the thickness of the bilayer no longer 
and permits them to interdigitate significantly, decreases decreases. 

the repulsion.” Because it is heterogeneous, naturally occurring 


When the amount of water in a sample of hydrated phosphatidylcholine will not crystallize. Synthetically 
natural phosphatidylcholine from eggs is systematically prepared dimyristoylphosphatidylcholine, however, has 
increased from 10% to 45%, changes in the structure of been crystallized, and a crystallographic molecular 
the bilayers occur.*°*' The width of the bilayers decrease model has been constructed (Figure 14-5).”” The crys- 
from 4.5 to 3.8 nm,” and the cross-sectional area for talline material consists of stacks of bilayers whose dis- 
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tribution of electron density along an axis normal to the 
planes of the bilayers (vertical axis in Figure 14-5) would 
resemble the distribution seen in Figure 14-4B. The mol- 
ecules of phospholipid are distributed symmetrically 
about the center of the bilayer just as they are in a bilayer 
of fluid natural phospholipids. In fact, in the crystal, 
there are crystallographic screw axes of symmetry 
between the methyl groups at the ends of the hydrocar- 
bons. Each plane of adjacent phospholipid molecules 
oriented in the same direction forms one of the two 
monolayers of one of the bilayers. As in the fluid, uncrys- 


Figure 14-5: Arrangement of the molecules of phosphatidyl- 
choline in the crystallographic molecular model of dimyris- 
toylphosphatidylcholine dihydrate.” The asymmetric unit in the 
unit cell of the P2, space group is formed from two molecules of 
dimyristoylphosphatidylcholine (1 and 2). The unit cell is outlined, 
and the 2-fold screw axes of symmetry normal to the plane of the 
page are designated. Because the crystal is the dihydrate, grown 
from a mixture of ether, ethanol, and water, each asymmetric unit 
has four water molecules (@) associated with it in the hydrophilic 
region between the bilayers. This corresponds to a water content of 
5%. Reprinted with permission from Nature, ref 52. Copyright 1979 
Macmillan Magazines Limited. 


talline bilayer, the functional groups of the phospholipid 
are encountered in the order choline, phosphate, glyc- 
erol, ester, hydrocarbon from the outermost surface to 
the interior. Because the fatty acids are homogenous, 
their hydrocarbon has solidified into a crystalline array 
that is hexagonally packed. Dilauroylphosphatidyl- 
ethanolamine crystallizes from acetic acid in bilayers, 
and the vertical hexagonal packing of the fatty acids in its 
crystallographic molecular model is readily discerned 
from a view normal to the plane of one of the bilayers 
(Figure 14-6). Such crystallographic models provide a 
starting point for a discussion of the structure of a bilayer 
of amphipathic lipids. It must be remembered, however, 
that they represent homogeneous lipids in which the 
alkane is fully saturated and solid. 

Four types of conformation at the glyceryl back- 
bone of phospholipids have been observed in crystallo- 
graphic molecular models (Figure 14-7), and there is 
evidence from nuclear magnetic resonance spectra that, 
in the liquid state, the phospholipids fluctuate among 
these conformations.” One interesting aspect of these 
structures is that in the conformations represented by 
dimyristoylphosphatidylcholine (DMPC) and dilauroyl- 
N,N-dimethylphosphatidylethanolamine (DLPEM,), the 
acyl carbon of the fatty acid on carbon 3 of the glyceryl 
group is buried more deeply in the bilayer than the acyl 
carbon of the fatty acid on carbon 2, while in the two con- 
formations represented by dilauroylphosphatidic acid 
(DLPA) and dimyristoylphosphatidylglycerol (DMPG), it 
is the acyl carbon of the fatty acid on carbon 2 that is 
buried more deeply. This means that, in the rapid transi- 
tions among these conformations that occur in bilayers 
of liquid phospholipid, the linear hydrocarbons slide 
back and forth past each other in a direction normal to 
the plane of the bilayer. As these sliding movements 
occur, each of the acyl oxygens on the two fatty acids 
comes in turn to the surface of the bilayer (Figure 14-7). 
Because of the sliding movements, the equilibrium 
among these conformations can be shifted by changing 
the structure of the surroundings in which the far ends of 
the fatty acids are located.” 

In these sliding fluctuations, the positions of the 
charged phosphate and the charged nitrogen on phos- 
phatidylcholine must average to about the same mean 
location relative to the surface of the bilayer. This follows 
from the fact that vesicles of phosphatidylcholine have 
zero electrophoretic mobility, which demonstrates that 
on the average the negative charges on their phosphates 
must reside in the same plane parallel to the surface of 
the bilayer as the positive charges on their cholines.” The 
dielectric properties of bilayers of phosphatidylcholine 
are also consistent with this disposition. In the crystallo- 
graphic molecular model of dilauroylphosphatidyl- 
ethanolamine (Figure 14-6), the ammoniums of the 
ethanolamines form hydrogen bonds to the oxygens of 
the phosphates, bringing the two opposite charges into 
the same plane parallel to the surface of the bilayer. 


Figure 14-7: Four conformations available to a phospholipid in a 
bilayer.“ These four drawings are taken from the crystallographic 
molecular models of crystalline dimyristoylphosphatidylcholine (DMPC), 
dilauroyl-N, N-dimethylphosphatidylethanolamine (DLPEM,), dilauroyl- 
phosphatidic acid (DLPA), and dimyristoylphosphatidylglycerol (DMPG). 
Within each structure the two fatty acids are the same length, so the rela- 
tive positions of the two fatty acids are most readily ascertained by look- 
ing at their ends. Within each structure the fatty acid to the right is the one 
on carbon 2 of the glycerol, and the fatty acid to the left is the one on 
carbon 3. In the upper two conformations, the fatty acid on carbon 3 of 
the glycerol is deeper in the bilayer of phospholipid than that on carbon 2, 
and in the lower two conformations the fatty acid on carbon 2 is deeper in 
the bilayer of phospholipid than that on carbon 3. In the upper two struc- 
tures the acyl oxygen of the fatty acid on carbon 2 of the glycerol is at the 
surface of the bilayer; in the lower two structures the acyl oxygen of 
the fatty acid on carbon 3 is at the surface of the bilayer. Reprinted with 
permission from ref 54. Copyright 1988 American Chemical Society. 
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Figure 14-6: View of the crystallographic 
molecular model of dilauroylphos- 
phatidylethanolamine looking down on 
the hydrophilic surface of the bilayer.” 
Only one monolayer of the bilayer of 
phospholipid is shown in the drawing. 
One of the oxygens of each phosphate 
forms a hydrogen bond with the ammo- 
nium group of an adjacent ethanolamine. 
The dilauroylphosphatidylethanolamine 
was crystallized from glacial acetic acid, 
and in the crystals there was one molecule 
of acetic acid (not shown) for each mole- 
cule of phospholipid. The fact that the 
linear hydrocarbons are all normal to the 
surface of the monolayer is apparent, and 
the drawing gives a representation of the 
view from above the surface of a bilayer of 
phospholipid. Reprinted with permission 
from ref 53. Copyright 1974 National 
Academy of Sciences. 
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Phosphatidylcholine and phosphatidylethanolamine 
are zwitterionic and neutral, but phosphatidylserine, 
phosphatidylglycerol, phosphatidylinositol, and half a 
molecule of diphosphatidylglycerol each have a net 
charge number of -1. Consequently, natural bilayers 
have a net negative surface potential. In a bilayer of pure 
phosphatidylserine, the potential at the surface is about 
-80 mV and falls off as a function of the distance from the 
surface as predicted by the Gouy-Chapman equation for 
an ionic double layer.””” At a distance of 1 nm from the 
surface in a solution of ionic strength of 0.1 M, the poten- 
tial has dropped to -30 mV. The magnitude of the surface 
potential varies with the mole fraction of negatively 
charged lipid in a bilayer and the ionic strength of the 
solution” and affects the adsorption of small charged 
molecules°”®' and proteins” to the bilayer in a pre- 
dictable manner. 

The electrostatic repulsion of the negatively 
charged phospholipids decreases the stability of a 
bilayer. In fact, at low ionic strength, bilayers of pure 
dimyristoylphosphatidylglycerol are unstable enough 
that the monomer has a measurable solubility in aque- 
ous solution (10°°-107!° mole fraction), which is almost 
unheard of. The solubilities of neutral phospholipids or 
monoanionic phospholipids with longer fatty acyl 
groups are so small that they cannot be measured. 
Bilayers of entirely monoanionic phospholipid are 
unstable enough that they do not occur naturally. There 
is, however, a mutant of E coli in which the synthesis of 
phosphatidylethanolamine has been deleted; and this 
curiosity, the phospholipids of which are almost entirely 
phosphatidylglycerol and diphosphatidylglycerol, will 
grow as long as the concentration of Mg” in the medium 
is high enough to decrease significantly the surface 
potential of its plasma membrane.™ 

The bilayers observed in crystallographic molecu- 
lar models are solids in which the hydrocarbon is frozen; 
the bilayers of the mixture of phospholipids purified 
from a natural source or the bilayer present in a biologi- 
cal membrane is liquid. The transition between solid 
and liquid resembles the melting of paraffin, and it can 
be observed by cooling a bilayer to solidify it and then 
raising the temperature gradually to melt it. As phos- 
pholipids from most natural sources remain fluid even 
at low temperatures, the transition is usually followed 
either in homogeneous, synthetic phospholipids or in 
biological membranes the composition of which is 
highly enriched in one particular fatty acid. For exam- 
ple, when the bacterium M. laidlawii is grown on 
medium supplemented with a chosen fatty acid, up to 
70% of the fatty acids in its membranes are the supple- 
mented fatty acid.°° 

The transition between solid and liquid can be 
observed by diffraction of X-radiation. In a solid bilayer 
the hydrocarbon of the phospholipids is in a hexagonal 
array (Figure 14-6), and the spacing between the linear, 
all-trans alkanes produces a strong sharp equatorial 


reflection (see Figure 14-4A) at (0.415 nm)” characteris- 
tic of crystalline paraffins. The cross-sectional area of 
such a hexagonal array of solid paraffin hydrocarbons is 
0.40 nm? for every two alkyl chains, and this agrees 
closely with the cross-sectional areas of 0.39-0.41 nm? 
for one complete molecule of phospholipid in each 
monolayer of the crystallographic molecular 
models.” The width of a solid bilayer of phospholipid 
with only saturated fatty acids is consistent® with the 
width of two layers of slightly tilted (<20°) alkanes of the 
appropriate length in the all-trans configuration (Figure 
14-5). 

When a solid bilayer is melted to a liquid bilayer by 
raising the temperature, its width decreases by 
0.5-1.0 nm.” If at the same time the hydrocarbon is 
expanding to the extent that paraffins expand as they 
become liquid, the cross-sectional area for each phos- 
pholipid in one of the monolayers of the bilayer must 
increase®®® to 0.55-0.70 nm”. This expansion of the 
cross-sectional area of the bilayer, among other factors, 
reflects the establishment of the normal disorder of the 
liquid state of a paraffin. In this state, gauche conforma- 
tions, which necessarily shorten the distance that can be 
covered by a hydrocarbon, become common features 
that lead to the narrowing of the bilayer. 

In such molten bilayers of fully saturated phospho- 
lipid, the hydrocarbons remain oriented preferentially 
with their long axis aligned with an axis normal to the 
plane of the bilayer. This conformation has been 
demonstrated by neutron diffraction of oriented bilayers 
of dipalmitoylphosphatidylcholine in which different 
carbons along the palmitates have been labeled with 
deuterium, an atom that scatters neutrons strongly. In 
the solid phase at low hydration, where all of the hydro- 
carbons are in hexagonal array normal to the plane of 
the membrane, the location of the deuteriums in the dis- 
tribution of scattering density is easily distinguished 
(Figure 14-8). In such solid bilayers, the deuteriums 
appear at the expected distances from the center (Table 
14-4). When bilayers of the various dipalmitoylphos- 
phatidylcholines are melted, the deuteriums in the fluid 


Table 14-4: Distance of Various Carbons from the Center 
of a Bilayer™ 


distance from center“ (nm) 


carbon deuterated solid bilayer fluid bilayer 
C15 0.20 + 0.1 0.19 + 0.1 
C14 0.36 + 0.1 0.36 + 0.1 
C9 0.94 + 0.1 0.81 40.1 
C5 1.21+0.2 1.05+0.1 
C4 1.53 + 0.2 1.22+0.1 


“Determined by neutron diffraction of bilayers of dipalmitoylphosphatidyl- 
choline selectively deuterated at the noted positions. 
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Figure 14-8: Distribution of neutron scattering density across 
bilayers of phospholipid in which hydrogen has been replaced by 
deuterium at specific locations along the linear alkyl groups of the 
phospholipid.™ Stacked, parallel bilayers of the di-(15,15-dideu- 
teriopalmitoyl) phosphatidylcholine (A), di-(5,5-dideuteriopalmi- 
toyl)phosphatidylcholine (B), or dipalmitoylphosphatidylcholine 
hydrated with 7H,O (C) were prepared on quartz slides. Each of 
these multilayers was brought to 20 °C, which is below the melting 
point of dipalmitoylphosphatidylcholine under these circum- 
stances, and allowed to diffract neutrons. From the meridional 
reflections and appropriate phases, a distribution of neutron scat- 
tering density (relative units) normal to the plane of the mem- 
branes as a function of distance from the center (nanometers) 
could be calculated by Fourier transformation. The positions of the 
deuterated carbons in the first two samples are clearly observed 
(arrows). The position of the °H,O in the third sample was defined 
by a difference map of neutron scattering density (C) between a 
specimen hydrated with H,O and one hydrated with °H,O. 
Reprinted with permission from Nature, ref 69. Copyright 1978 
Macmillan Magazines Limited. 


hydrocarbon remain similarly distributed (Table 14-4), 
but each moves closer to the center. This rearrangement 
is consistent with the narrowing of the bilayer that 
occurs upon melting. In fluid bilayers of phospholipids 
in which all of the fatty acids are the same length, the 
methyl groups at the ends of the fatty acids from the two 
monolayers end up adjacent to each other (Figure 14-5), 
but in bilayers in which the two fatty acids on the phos- 
pholipids are of significantly different length, the longer 
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fatty acids interdigitate so that the methyl groups of the 
longer fatty acids on one monolayer end up adjacent to 
the methyl groups of the shorter fatty acids on the 
other.” 

The transition between solid and liquid in a bilayer 
composed of mixtures of various homogeneous, syn- 
thetic phospholipids has also been studied. When a sus- 
pension of bilayers composed of only one phospholipid 
such as dipalmitoylphosphatidylcholine is melted, a 
single sharp transition that occurs completely over 
2-3 °C is observed. It can be monitored in a calorimeter 
as the absorption of heat resulting from the heat of 
fusion.” When two phospholipids are mixed, the transi- 
tion occurs over a broader range of temperatures some- 
where between the temperatures of the transitions of the 
separated components.’ At temperatures in the range 
over which the transition is occurring, regions of fluid 
phase are in equilibrium with regions of solid phase lat- 
erally separated from each other in the plane of the 
bilayer.” In many instances, the two component phos- 
pholipids are not miscible with each other as solids 
and a separate solid phase of one or the other remains 
laterally isolated in the bilayer.” For example, mixtures 
of up to 40% dipalmitoylphosphatidylethanolamine in 
dimyristoylphosphatidylcholine contain significant 
regions of unmixed dimyristoylphosphatidylcholine in 
the solid phase.” These results suggest that regions of 
immiscible, unmelted phospholipid might form in nat- 
ural bilayers under certain circumstances. The heteroge- 
neous mixtures of phospholipids normally found in 
natural circumstances, however, appear to form bilayers 
that are fully liquid and fully miscible at all physiological 
temperatures. 

The fluid bilayers in normal biological membranes 
and in bilayers formed spontaneously from the amphi- 
pathic lipids extracted from normal biological mem- 
branes have been studied by following the electron 
spin resonance of probes incorporated into them. 
Nitroxyl fatty acid 14-9” is an example of such a 
probe: 


The nitroxyl radical is in a five-membered ring similar to 
that of the 1-oxyl-2,2,5,5-tetramethylpyrrolin-3-yl radical 
(12-11). The unpaired electron in the nitroxyl radical is 
located in a a molecular orbital the principal axis of 
which is aligned parallel to the axis of the all-trans linear 
hydrocarbon: 
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The spectrum of 3-carbamoyl-1-oxyl-2,2,5,5-tet- 
ramethylpyrroline freely tumbling in aqueous solution 
displays three sharp peaks of equal intensity (Figure 
12-35), reflecting the rapid isotropic motion of the 
molecule and the full coupling of the unpaired electron 
and the nitrogen nucleus. When nitroxyl fatty acid 14-9, 
however, is incorporated into an oriented multibilayer of 
phosphatidylcholine from eggs of G. gallus, the 
absorbances of the unpaired electron in the spin-labeled 
fatty acid within the multibilayer are much broader 
because the motions reorienting its nitroxyl radical are 
much less rapid (Figure 14-9).’””* The decrease in the 
intensity of the symmetrically displayed hyperfine peaks 
resulting from the change in bonding of the nitroxyl rad- 
ical (12-12) that occurs within the nonpolar environ- 
ment of the interior of the bilayer is also apparent. 

The spectra also have become anisotropic because 
the motion of the nitroxyl radical has become 
anisotropic. The easiest way to demonstrate this is to 
take two spectra from the specimen oriented so that the 
magnetic field is either normal to the planes of the multi- 
bilayers (Figure 14-9B) or parallel to the planes of the 
multibilayers (Figure 14-9C). In the one case the splitting 
of the peaks is larger than that of the isotropic spectrum, 
and in the other it is smaller. From theoretical simula- 
tions of these spectra, it could be concluded that, in the 
bilayers of phosphatidylcholine, nitroxyl fatty acid 14-9 
is oriented with the axis of its hydrocarbon perpendicu- 
lar to the planes of the multibilayers and rotates exclu- 
sively or almost exclusively about this axis. These 
conclusions are consistent with expectations based on 
the structure of a bilayer of phospholipid (Figure 14-2) 
and the amphipathic structure of nitroxyl fatty acid 14-9. 
When such nitroxyl radicals are incorporated into multi- 
lamellar vesicles of phosphatidylcholine suspended in 
water (Figure 14-3), the spectrum that results is a com- 
posite of the perpendicular and parallel spectra seen in 
Figure 14-9, panels B and C, respectively.” 

That molecular motions of the linear alkane in a 
fluid bilayer of phospholipid increase in proceeding from 
the acyl carbon to the center can be demonstrated with 
spin-labeled phospholipids.” A series of phosphatidyl- 


Figure 14-9: Electron spin resonance spectra of azacyclic 
N-oxides. (A) 2,2,5,5-Tetramethyl-3-(aminocarbonyl)azacyclo- 
pentane N-oxide (see 12-11) dissolved in water.” 
(B, C) 2-(10-Carboxydecyl)-2-hexyl-5,5-dimethyl-3-azatetrahydro- 
furan N-oxide (14-9) incorporated into multibilayers of phos- 
phatidylcholine from eggs of G. gallus. The magnetic field of the 
spectrometer was oriented perpendicular (B) or parallel (C) to the 
plane of the cover slip.’® Nitroxyl fatty acid 14-9 and phosphatidyl- 
choline were dissolved in chloroform and the mixture was evapo- 
rated to dryness. Water was added and a portion of the opalescent 
suspension that resulted was spread on a cover slip. The water was 
evaporated at 39 °C to produce the multibilayers oriented by the 
plane of the cover slip. In all of the spectra, the derivative of the 
absorbance is presented on the vertical axis. The microwave fre- 
quency was held at a constant value while the strength of the mag- 
netic field was varied continuously. The magnetic flux density 
(tesla) is the variable on the horizontal axis. The distance between 
the two absorbances at the ends of the spectrum in panel A is 
0.0028 T. The vertical lines mark the positions of maximum absorp- 
tion (zero slope) of microwave energy. Reprinted with permission 
from refs 77 and 78. Copyright 1965 and 1969 National Academy of 
Sciences. 


cholines were synthesized in which the acyl groups on 
carbon 1 of the sn-glycerol 3-phosphate were derived 
from either palmitic acid or stearic acid, and the acyl 
groups on carbon 2 were derivatives of palmitic acid or 
stearic acid, respectively, on which the cyclic dimethyl 
nitroxyl radical was positioned at the 5th, 8th, 12th, and 
16th carbon.” An order parameter, S, can be defined, 
which is a number that quantifies the confinement of the 
rotational motion of these cyclic nitroxyl radicals to one 
particular axis. When S = 1, the ring rotates about a fixed 
axis in space; and when S = 0, its rotational motion is 
isotropic (Figure 14-9A). 

When the various labeled phospholipids were 


incorporated into multibilayers of natural phosphatidyl- 
choline, the order parameter S was observed to decrease 
as the cyclic nitroxyl radical was situated farther from the 
acyl carbon. For cyclic nitroxyl radicals at the 5th, 8th, 
12th, and 16th carbon of the labeled fatty acid, the order 
parameters S were 0.68, 0.50, 0.33, and 0.16, respec- 
tively.” In experiments of this type, caution must be 
taken that the probe does not affect the behavior of the 
alkane to which it is attached. For example, when a sim- 
ilar experiment was performed with the fluorescent 
probe 7-nitro-2,1,3-benzoxadiazol-4-y)l covalently 
attached to the phospholipid, the hydrophilicity of the 
probe drew the end ofthe fatty acyl group to which it was 
attached to the surface of the membrane, pulling it away 
from the center of the Core Hl The order parameters 
observed with the much less hydrophilic nitroxyl group, 
however, have been confirmed by using deuterium as a 
probe, which produces the least possible perturbation of 
the fatty acyl group. 

Measurements of the order parameter as a function 
of the position along the linear alkane of the fatty acyl 
groups in a phospholipid have been made by deuterium 
nuclear magnetic resonance.” A series of synthetic 
phospholipids were prepared that contained either 
palmitic acid at both carbon 1 and carbon 2 or palmitic 
acid at carbon 1 and oleic acid at carbon 2 of the sn-glyc- 
erol 3-phosphate. In each member of the series, deu- 
terium atoms were placed synthetically on a specific 
carbon in the palmitic acids. Because a deuteron, like a 
nucleus of “nitrogen, is quadripolar, a deuterium 
nuclear magnetic resonance spectrum can also be used 
to estimate an order parameter S for the degree of 
anisotropy experienced by the carbon to which it is 
attached.°”*’ Order parameters S for dipalmitoylphos- 
phatidylcholine, 1-palmitoyl-2-oleoylphosphatidylcho- 
line, and dipalmitoylphosphatidylserine, gathered from 
bilayers of these phospholipids held at temperatures an 
equivalent distance above each of their melting points, 
have been presented as a function of the carbon on 
which the deuteriums were located (Figure 14-10).** The 
confinement experienced by a carbon in the liquid 
hydrocarbon of the bilayer decreases as the distance 
from the acyl carbon increases. Carbons at the very core 
of the bilayer are able to assume almost every orienta- 
tion, while carbons near the periphery are confined in 
their orientations. These observations relate to an appar- 
ent stereochemical paradox in the structure of a bilayer 
of phospholipids from natural sources. 

Beyond the eighth carbons of the fatty acids 
attached to carbon 2 of either the sn-glycerol 3-phos- 
phates or the sphingosines in a natural bilayer, within the 
core of the hydrocarbon, the carbon-carbon double 
bonds begin (Figure 14-2). This fact has two conse- 
quences. First, permanent elbows incompatible with 
straight linear alkanes necessarily disrupt the alignments 
of the hydrocarbons of the fatty acids. Second, the aver- 
age length for each carbon is decreased by the multiple 
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double bonds. Both the disorder introduced by single 
and triple cis double bonds and the shortening of the 
chains caused by all of the double bonds necessarily 
increase the mean cross-sectional area parallel to the 
plane of the bilayer for each chain of hydrocarbon in this 
region distal to the glyceryl groups located on the two 
sides of the bilayer. Added to this effect is the disorder 
that naturally occurs farther away from the surfaces of 
the bilayer even in saturated phospholipids (Figure 
14-10). 

Before the ninth carbons of the acyl groups on 
carbon 2 of the sn-glycerol 3-phosphates and sphin- 
gosines, however, essentially all of the hydrocarbon is 
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Figure 14-10: Variation of the molecular order parameter, Bac 
estimated from deuterium nuclear magnetic resonance spectra as 
a function of the position of the deuterium along the hydrocarbon 
of 1,2-di(dideuteriopalmitoyl)phosphatidylcholine (O), 1,2-di- 
(dideuteriopalmitoyl) phosphatidylserine (0), or 1-(dideuteriopalm- 
itoyl)-2-oleoylphosphatidylcholine (A).** A series of selectively 
dideuteriated palmitic acids were synthesized, each with two deu- 
teriums at a different carbon along the chain. From these dideu- 
teriopalmitic acids a series of dipalmitoylphosphatidylcholines 
was synthesized, each with two palmitic acids in which deuterium 
occupied the same position. These lipids were separately sus- 
pended in water at a temperature 19 °C above their melting points 
(41 °C) and deuterium nuclear magnetic resonance spectra were 
recorded, from which order parameters, Smo were calculated (O). 
Asimilar series of dipalmitoylphospatidylserines was also prepared 
and a similar analysis performed at 51 °C (©). Samples of each of 
the dipalmitoylphosphatidylcholines were digested with phospho- 
lipase A, and the resulting 2-lysophosphatidylcholines were 
esterified with oleic acid to produce a series of 1-(dideuteriopalmi- 
toyl)-2-oleoylphosphatidylcholines in each of which two deuteri- 
ums occupied a different position, respectively, in the palmitic 
acid. Deuterium nuclear magnetic resonance spectra were taken of 
suspensions of these phospholipids at a temperature 16 °C above 
their melting points (-5 °C). From these spectra, order parameters, 
Smop were calculated (A). A series of palmitic acids selectively deu- 
teriated at specific positions were separately fed to M. laidlawii 
bacteria to enrich (70%) the membranes of these cells in the added 
fatty acid, and the order parameters for samples of each of these 
selectively deuteriated membranes were also determined (x). 
Order parameters are plotted as a function of the position of the 
labelled carbon in the respective fatty acids. Reprinted with per- 
mission from ref 84. Copyright 1978 Elsevier Science Publishers. 
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linear, saturated alkane, and more ordered (Figure 
14-10). In a bilayer composed only of phospholipids and 
sphingomyelins, these regions of alkane proximal to the 
glyceryl groups are nevertheless necessarily required to 
have the same mean cross-sectional area for each phos- 
pholipid as exists in the distal regions at the core. All of 
these considerations require that the solution to this 
paradox incorporate a large cross-sectional area, low 
density, and high disorder in the regions distal to the 
glyceryl groups at the core of the hydrocarbon and a large 
cross-sectional area, high density, and low disorder in 
the two symmetrical regions of alkane proximal to the 
glyceryl groups in a bilayer. 

One solution to this paradox would be that the 
linear alkanes in the two proximal regions are tilted.” 
Tilting the alkane increases its cross-sectional area paral- 
lel to the plane of the bilayer while retaining the density 
of the condensed, hexagonal array. This possibility has 
been examined with a series of synthetic phosphatidyl- 
cholines that each had a dimethyl cyclic nitroxyl radical 
attached at a different carbon in the saturated fatty acyl 
group on carbon 2 of their sn-glycerol 3-phosphates. 
These labeled phosphatidylcholines were incorporated 
into bilayers of phosphatidylcholine from eggs of 
G. gallus. When the dimethyl cyclic nitroxyl radical was 
on the 5th or the 8th carbon, its principal axis was tilted 
30° relative to the plane of the bilayer, but when it was on 
the 12th or the 16th carbon, its principal axis was ori- 
ented on average normal to the plane of the bilayer.*° 
Tilted hydrocarbon chains have been directly observed 
in crystallographic molecular models of bilayers of phos- 
pholipids.° A tilting of the alkyl chains in the two regions 
on the two sides of the bilayer proximal to the glyceryl 
groups would also explain the increase in cross-sectional 
area and decrease in width that occurs upon the melting 
of bilayers of homogeneous phospholipids.” The 
implausible feature of this explanation is that all of 
the alkyl chains in a fluid bilayer would have to tilt in the 
same direction within one of these regions for it to be 
correct. 

Such a coordinated tilting of the hydrocarbon over 
large areas of a solid bilayer has been observed by dif- 
fraction of X-radiation. In bilayers of dipalmitoylphos- 
phatidylcholine and distearoylphosphatidylcholine 
below the temperature at which they melt, the equatorial 
reflection in the X-ray diffraction pattern that arises from 
the aligned chains of alkane displays the fine structure of 
a sharp reflection superimposed upon a broader reflec- 
tion.® This distribution of reflected intensity has been 
shown to arise from hydrocarbons tilted relative to the 
axis normal to the bilayer, and the degree of tilt can be 
calculated from the diffraction pattern. As the degree of 
hydration in these solids was increased from 6% to 30%, 
the tilt of the hydrocarbons increased from 17° to 40°. 
Presumably the steric effects of the hydration force the 
hydrophilic functional groups of the phospholipids to 
take up a greater surface area for each molecule of phos- 


pholipid, and the linear hydrocarbons adjust to the 
required increase in their cross-sectional area by tilting. 
It has also been observed that a dimethyl cyclic nitroxyl 
radical, attached at the fifth carbon of a fatty acid on 
carbon 2 of the sn-glycerol 3-phosphate of dipalmitoyl- 
phosphatidylcholine, appeared to be in a more polar 
environment in bilayers of natural phospholipids than 
nitroxyl radicals attached farther down the fatty acid.” If 
the hydrocarbons were tilted in this region, their surfaces 
should be more exposed to the aqueous phase. 

These observations, however, illustrate a difficulty 
in interpreting many of the physical studies on bilayers. 
The bilayers used in these experiments were vesicles pre- 
pared by sonication. It has been shown by nuclear mag- 
netic resonance spectroscopy that vesicles of small 
diameter (30-90 nm) prepared by sonication display 
anomalous physical properties because of their high cur- 
vature.°’ It has been pointed out that these anomalies 
arise from the fact that the high curvature unavoidably 
forces a portion of the hydrocarbon adjacent to the 
hydrophilic functional groups of the phospholipids to 
occupy locations on the outer surface of the vesicle in 
contact with the water,” and this would explain why the 
environment of the nitroxyl radical in this situation 
appears to be so polar. Naturally occurring bilayers, how- 
ever, rarely have such high curvature. The tension within 
such small vesicles produced by sonication seems to be 
significant. When the kinetic barrier is overcome by 
adding appropriate catalysts, small vesicles of phospho- 
lipid spontaneously fuse among themselves to produce 
much larger single-walled structures.” 

The paradox of the cross-sectional areas has been 
phrased in terms of the structure of bilayers of phospho- 
lipids from natural sources because this is the most crit- 
ical situation. Natural phospholipids have unsaturated 
fatty acids that necessarily disrupt the hydrocarbon in 
the core of their bilayers (Figure 14-2). Consistent with 
the stereochemical consequences of the cis double 
bonds, the most obvious discontinuity in the plot of the 
order parameter S against position (Figure 14-10) occurs 
after the sixth carbon of the palmitate on 1-palmitoyl- 
2-oleoylphosphatidylcholine (see Figure 14-2A). A simi- 
lar, although a much less abrupt, discontinuity, however, 
also seems to be present in the plot for bilayers formed 
from fully saturated phospholipids such as dipalmi- 
toylphosphatidylcholine. Even in this case, the disorder 
increases most precipitously beyond the eighth carbon. 

It has been argued that there is no stereochemical 
paradox associated with the first seven saturated carbons 
of the fatty acyl groups in a bilayer of phospholipid in 
which there is the normal complement of unsaturated 
fatty acyl groups if the carbon-carbon bonds connecting 
these carbons can support enough gauche configura- 
tions to make this region fluid enough to fill the volume 
allotted to it.” At first glance, this would seem difficult to 
accomplish. The difference in standard free energy 
between a trans and a gauche configuration in a linear 


alkane is about 5 kJ mol’, which would permit some- 
what fewer than one gauche configuration in each of the 
segments of six carbon-carbon bonds in this region. 
Furthermore, the fact that one end of each of these heptyl 
segments is nailed to the head group of its phospholipid, 
which must occupy the interface with the water, provides 
a significant additional restraint to the ability of the 
hydrocarbon in this region to fill this volume fluidly. 
The hydrocarbon in this region simply does not have the 
same flexibility as that of the hydrocarbon in liquid linear 
alkane. 

The intuition that this region of the hydrocarbon is 
not fluid enough to fill the necessary volume is consistent 
with the universal observation that these regions are 
more oriented than the distal regions nearer the center of 
the bilayer (Figure 14-10). It is also consistent with the 
distribution of electron density in bilayers of naturally 
occurring phosphatidylcholine (Figure 14-4B), because 
the two symmetric regions of alkane proximal to the glyc- 
eryl groups have the highest electron density and the 
regions of hydrocarbon at the center of the bilayer have 
the lowest electron density. The two symmetrical shoul- 
ders of intermediate electron density within the proximal 
regions of the hydrocarbons are thought to be real fea- 
tures of the structure of the bilayer rather than artifacts 
of the sinusoidal transform.*°*’ 

Nevertheless, the linear saturated alkane in this 
region may be fluid enough to fill the volume that it must 
without needing to resort to coordinated tilting. For 
example, the saturated hydrocarbon in a bilayer of phos- 
phatidylcholine from eggs of G. gallus does seem to have 
a higher frequency of gauche conformations than does 
liquid hexadecane.”' More likely, however, is that the 
paradox of the cross-sectional areas in a bilayer of just 
phospholipid is solved by some tilting and some fluidity. 

Regardless of how bilayers of just phospholipid 
solve the problem, steroids in natural membranes play a 
major role in overcoming the stereochemical paradox 
posed by the high disorder and low density of the hydro- 
carbon in the core distal to the glyceryl groups and the 
low disorder and high density of the alkane in the regions 
proximal to the glyceryl groups in a bilayer of phospho- 
lipid. All eukaryotic membranes contain significant 
quantities of steroids. In animal membranes, cholesterol 
is the major steroid: 


It accounts for about 20-30% of the mass of the lipids in 
a membrane, and the mole fraction of cholesterol to 
phospholipid’? varies between 0.3 and 0.6. Each mole- 
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cule of cholesterol is more or less confined to one or the 
other surface of the bilayer,” presumably because its 
hydroxyl is hydrogen-bonded to water, but both mono- 
layers have about the same mole fraction of cholesterol. 
The long axis of the cholesterol is aligned normal to the 
bilayer.” The nuclear magnetic resonance spectrum of 
(‘H]cholesterol incorporated into bilayers of 
(*H]dipalmitoylphosphatidylcholine is consistent with 
the confinement of its fused rings to the more ordered 
regions of the hydrocarbon proximal to the glyceryl 
groups and the incorporation of its isoprenoid tail into 
the more fluid distal regions of the core.” 

Along its long axis, a molecule of cholesterol has a 
van der Waals cross-sectional area (6-12) in a Corey- 
Pauling-Koltun space-filling model of 0.25 nm? for the 
first 1.0 nm and then abruptly, at its isoprenoid tail, the 
van der Waals cross-sectional area decreases to 0.12 nm? 
for its last 0.8 nm.” It has been proposed that the portion 
of the cholesterol with the largest cross-sectional area 
occupies the space between the chains of the alkane in 
the regions proximal to the glyceryl groups in a natural 
bilayer and permits them to straighten their posture and 
assume a fully extended almost all-trans configuration 
normal to the plane of the bilayer. Consistent with this 
proposal, the addition of cholesterol to bilayers of phos- 
pholipid decreases the frequency of gauche conforma- 
tions in their alkyl chains considerably’ and increases 
the alignment of 1,6-diphenyl-1,3,5-hexatriene normal 
to the plane of the bilayer.” This stereochemical function 
for cholesterol would explain why its addition to bilayers 
of natural phosphatidylcholine decreases their fluidity 
but its addition to bilayers of synthetic dipalmitoylphos- 
phatidylcholine” and dimyristoylphosphatidylcholine” 
increases their fluidity. 

Measurements of the diffraction of X-radiation 
from bilayers formed from mixtures of natural phos- 
phatidylcholine and cholesterol also support this struc- 
tural proposal. When the distribution of electron density 
in oriented bilayers of cholesterol and phosphatidyl- 
choline is compared to that of bilayers of phosphatidyl- 
choline alone, an increase in electron density occurs in 
the regions proximal to the glyceryl groups rather than in 
the central core of the hydrocarbon (compare panel D 
with panel B in Figure 14-4). Unlike bilayers of pure 
natural phosphatidylcholine, in which the alignment of 
the linear alkane with an axis normal to the plane of the 
bilayer is poor and decreases as hydration is increased, 
bilayers of an equimolar mixture of phosphatidylcholine 
and cholesterol have their linear alkane closely aligned 
with the normal axis (compare the equatorial reflections 
in panels A and C of Figure 14—4),”° and this alignment 
does not change as hydration is changed. 

As cholesterol is added to a bilayer of natural phos- 
phatidylcholine at a constant concentration of water, the 
width of the bilayer increases linearly” with the concen- 
tration of cholesterol until it reaches a maximum width 
at a mole fraction of cholesterol in phospholipid of 0.33. 
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At the maximum, the width of the bilayer has increased 
by 19%.” At the same time, however, the cross-sectional 
area for each molecule of phosphatidylcholine in a 
monolayer of the bilayer decreases from 0.62 to 0.48 nm’, 
if it is assumed that each molecule of cholesterol con- 
tributes 0.37 nm? to the surface area.” These are the 
changes expected if cholesterol straightens the posture 
of the alkane in the regions of the bilayer proximal to the 
glyceryl groups. The minimum value of 0.48 nm? for the 
cross-sectional area is not far from the value of 0.40 nm? 
for the cross-sectional area of a pair of hexagonally 
arrayed all-trans linear alkanes in a solid paraffin. 
Because the molecules of cholesterol are spacing the 
molecules of phosphatidylcholine, the phosphocholine 
head groups are more widely separated, and the steric 
effects of their hydration are no longer significant. If the 
distance between the esters in a bilayer of dioleoylphos- 
phatidylcholine is 3.2 nm, the width of the hydrocarbon 
should be 3.0 nm.” If this bilayer is representative of one 
composed of only phospholipid and if the addition of the 
normal amount of cholesterol increases the width of the 
hydrocarbon in such a membrane of phospholipids by 
20%, the width of the hydrocarbon in a membrane of 
phospholipids and cholesterol in a cell should be about 
3.6 nm. 

There is evidence from calorimetric studies,” X-ray 
diffraction,” deuterium nuclear magnetic resonance,” 
and pressure-area functions of monolayers’””'™ that 
complementary interactions occur between cholesterol 
and phospholipids, causing them to segregate into dis- 
tinct phases. For example, between mole fractions of 
0.08 and 0.28 mole % cholesterol at 30 °C, the lipids in a 
bilayer of dimyristoylphosphatidylcholine and choles- 
terol separate into a phase enriched in cholesterol and a 
phase depleted in cholesterol. Above 0.28 mole %, the 
cholesterol and the dimyristoylphosphatidylcholine are 
miscible and form a single phase." Below the melting 
point of pure dimyristoylphosphatidylcholine, the phase 
enriched in cholesterol remains fluid while the phase 
depleted in cholesterol with which it coexists solidifies. 
The phases enriched in cholesterol that separate from 
mixtures of cholesterol and phospholipid have volumes 
that are smaller than the sum of the volumes of the sep- 
arate components!" and have much broader phase tran- 
sitions.” In these distinct phases, the alkane of the 
phospholipid is more ordered than it is in the absence of 
cholesterol but reorients more rapidly,” results consis- 
tent with a decrease in the frequency of gauche confor- 
mations and the elimination of any tilting of that alkane. 

These separate phases enriched in cholesterol usu- 
ally have distinct molar ratios between the two compo- 
nents. The stoichiometry between cholesterol and 
phospholipid in these phases varies depending on the 
type of phospholipid with which the cholesterol is 
mixed.'” In mixtures between synthetic phospholipids 
and cholesterol the mole fraction of cholesterol in one of 
these separate phases is usually between 0.25 and 


0.4, but with mixtures of natural phospholipids it 
may be higher, judging from the normal ranges in the 
cholesterol composition in biological membranes. 

One possibility is that the stoichiometry between 
cholesterol and phospholipid established in one of these 
phases for a particular phospholipid or mixture of phos- 
pholipids is determined by the adjustment of volumes 
within the bilayer that is accomplished upon the dissolu- 
tion of the cholesterol.’ The difference in cross- 
sectional area between the fused rings and the isoprenoid 
of the cholesterol cancels the imbalance between the 
cross-sectional area for the regions of alkane proximal to 
the glyceryl groups and the cross-sectional area for the 
hydrocarbon in the distal region beyond the eighth car- 
bons of the fatty acyl groups. To effect this cancellation 
completely, there should be an optimal molar ratio 
between cholesterol and phospholipid. The fact that, 
within these separate phases formed between phospho- 
lipid and cholesterol, the molecules of cholesterol are 
evenly spaced, with each molecule of cholesterol sur- 
rounded by about four molecules of phospholipid,'™ is 
consistent with the adjustment of the volumes being the 
force establishing the stoichiometry and the very exis- 
tence of these phases. 

The distribution of electron density across a bilayer 
of amphipathic lipids in a membrane from a eukaryotic 
cell, represented by an equimolar mixture of cholesterol 
and phosphatidylcholine (Figure 14-4D), displays three 
regions. 

First, the two symmetrical boundaries of high elec- 
tron density, formed by the hydrophilic head groups, the 
glyceryl group, and the esters, sandwich the hydrocarbon 
and provide interfaces compatible with the water on 
either side. The distance between the two maxima of 
electron density that designate these two interfaces in a 
bilayer of phospholipid and cholesterol is 5.0-5.4 nm.*” 
In a natural membrane, these surfaces are irregular 
(Figure 14-5) and are formed by the phosphocholines, 
phosphoethanolamines, phosphoserines, phosphoinos- 
itols, and oligosaccharides. These surfaces are constantly 
changing in appearance owing to the fluid state of the 
bilayer. 

Second, the two symmetric regions of hydrocarbon 
proximal to the glyceryl groups, formed by the first seven 
saturated carbons of the fatty acyl chains (Figure 14-2) 
and the fused rings of the cholesterol (14-11), have a 
lower electron density (Figure 14-4D) only because they 
are hydrocarbon. They are densely packed, with the 
alkane of the fatty acyl groups predominantly in its fully 
extended, all-trans configuration aligned normal to the 
plane of the bilayer, supported by the fused rings of the 
cholesterol and in turn spacing the molecules of choles- 
terol in a fairly uniform distribution. In one of the mono- 
layers of a bilayer formed from an equimolar mixture of 
cholesterol and natural phosphatidylcholine, the cross- 
sectional area for each phospholipid is 0.48 nm’, if the 
cross-sectional area for each cholesterol is 0.37 nm?. The 


width of each of the two symmetrically displayed proxi- 
mal regions of hydrocarbon is about 1 nm. They com- 
mence at the level of the two acyl carbons attached to the 
glycerol and extend into the bilayer to the level at which 
the unsaturation of the fatty acids and the isoprenoid tail 
of the cholesterol commence. 

Third, within these two symmetric boundaries, the 
central core of the bilayer contains the unsaturated 
hydrocarbon of the fatty acids and the disordered alkane 
of the fatty acids and the isoprenoid tail of the choles- 
terol. It has the lowest electron density (Figure 14-4D) 
because the disorder of the hydrocarbon in this region 
increases the frequency at which vacant space is encoun- 
tered. It is believed that the hydrocarbon in the central 
core of the bilayer has most of the properties of liquid 
paraffin. The width of the central core, about 1.5 nm, 
brings the width of the entire sheet of hydrocarbon to 
about 3.6 nm. 

When an amphipathic lipid, such as natural phos- 
phatidylcholine from eggs of G. gallus, is spread at an 
interface between air and water, it forms a monolayer 
with its hydrophilic functional groups directed toward 
the water and its hydrophobic hydrocarbon directed 
toward the air. The area for each phospholipid in this 
monolayer is a function of the surface pressure. This 
pressure is exerted mechanically by changing systemati- 
cally the area of the surface, and it is measured with a tor- 
sion balance. Above a certain pressure, or in other words 
below a certain area, the monolayer becomes so com- 
pressed that phospholipid molecules leave it and form 
small patches of bilayer adhering to the monolayer. 

Before this breakdown, however, the area for each 
molecule of phospholipid at the interface is a monotonic 
inverse function of the surface pressure (Figure 
14-11). 9% The explanation for this behavior is that, at 
low pressure, the tendency of the hydrocarbon to be 
maximally disordered causes the monolayer to have a 
large surface area, which is about 3-fold greater than that 
of hexagonally packed linear alkane oriented normal to 
the interface. The molecules of phospholipid, at zero 
pressure, do not lie flat upon the surface, presumably 
because this would bring all of their hydrophobic hydro- 
carbon into contact with water. The observed surface 
area at zero pressure is a balance between the entropy 
that would spread the lipids and the hydrophobic effect 
that would contract them. As the surface is compressed 
and its free energy is thereby increased, the hydrocar- 
bons become more and more aligned, and of lower and 
lower entropy. The surface area of a molecule of phos- 
phatidylcholine in a normal fluid bilayer of natural phos- 
phatidylcholine is about 0.7 nm’, which corresponds to a 
surface pressure of about 37 dyn cm’ in a monolayer at 
an air—water interface. 

The same measurements can be made on a mono- 
layer of natural phosphatidylcholine at an interface 
between an alkane, such as n-hexadecane, and water 
(Figure 14-11).!° At each surface pressure, the mono- 
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Figure 14-11: Relationship of surface pressure (dynes centime- 
ter!) and area [nanometers” (mole of phosphatidylcholine)”'] for 
monolayers of phosphatidylcholine purified from eggs of G. gallus 
at an interface between air and water (CIE or between n-hexa- 
decane and water (A^). Phospholipid was spread at the interfaces 
from a solution in n-hexane; and, in the case of the interface with 
air, the hexane evaporated immediately, leaving behind the mono- 
layer of lipid. In each case, a monolayer of phosphatidylcholine was 
produced. The area of the interface could be varied by movable 
boundaries, and the interfacial pressure could be measured 
directly by a torsion balance. Areas for a molecule of phospholipid 
were calculated on the assumption that all of the phospholipid 
added to the system had been incorporated into the monolayer. 
Arrows mark areas of 0.7 nm? mol. Adapted with permission from 
refs 105 and 106. Copyright 1960 Biochemical Society and 1971 
Springer-Verlag. 


layer in this situation has a greater surface area than 
when it is backed by air. The reason for this is that, 
because of van der Waals forces, liquid alkane is more 
compatible with the hydrocarbon side of the monolayer 
than is air; and when the monolayer is backed by alkane, 
there is not an interfacial free energy causing it to con- 
tract spontaneously and minimize its surface area at the 
interface between hydrocarbon and air as well as at the 
interface between phospholipid and water. 
Consequently, the pressure needed to compress this 
monolayer to 0.7nm? (molecule of phosphatidyl- 
choline)” is 44 dyn cm’. 

A droplet of alkane in water has a surface tension of 
about 50 dyn cm", which reflects the free energy of the 
hydrophobic effect. As amphipathic lipid is added to a 
droplet of alkane, its surface tension rapidly decreases 
and reaches zero before the mole fraction of the amphi- 
pathic lipid reaches 1.’ When its surface tension 
reaches zero, the surface area of the droplet will begin to 
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expand indefinitely, and at a mole fraction of amphi- 
pathic lipid equal to 1, it will be a bilayer.’” That the sur- 
face tension of a bilayer of amphipathic lipid is zero has 
been verified experimentally.’ The initial surface ten- 
sion of the droplet of alkane is a direct measurement of 
the cohesive force of the hydrophobic effect. In the 
bilayer, this cohesive force is still in operation, trying to 
minimize its surface area, but it is counterbalanced by 
the hydration of the head groups. 

From these various considerations, it follows that a 
bilayer of amphipathic lipids immersed in an aqueous 
solution represents a compromise among a number of 
forces. The hydrocarbon of the fatty acyl groups is 
hydrophobic and is withdrawn as successfully as possi- 
ble from contact with the water by the cohesive force of 
the hydrophobic effect. The most successful stereo- 
chemical solution to this problem would be all-trans 
alkane in the regions proximal to the water oriented 
normal to the surface in hexagonal array. The presence of 
cis double bonds, the steric effects of hydration, and the 
entropy of the liquid state defeat this solution (Figures 
14-10 and 14-11) and cause the cross-sectional area for 
each amphipathic lipid to be greater than the value of 
0.40 nm? for hexagonal packing. This stereochemically 
enforced spreading of the bilayer must expose some of 
the hydrocarbon in the regions proximal to the water.” 
The hydrophilic head groups of the amphipathic lipids 
are facing the aqueous phase. If they were buried or ster- 
ically excluded from contact with water, their free ener- 
gies of hydration would be lost, which would be 
unfavorable (Figures 5-8 and 5-18). In the compromise 
among the various forces, a bilayer of unadulterated nat- 
ural phosphatidylcholine ends up with about half the 
surface area (0.7 nm?) for each molecule in one of its two 
monolayers as that for a molecule in a monolayer of 
phosphatidylcholine at an air-water interface (Figure 
14-11) at zero surface pressure. Presumably, this differ- 
ence between a bilayer and a monolayer arises from the 
greater cohesion, due to van der Waals forces, that can be 
established within an interior of liquid hydrocarbon as 
opposed to a thin layer of hydrocarbon at an interface 
with air and from the fact that there are two interfaces 
with the water, one on each side of the bilayer. 

A bilayer represents one example from a spectrum 
of different structures that can be formed by amphi- 
pathic compounds such as amphipathic lipids, soaps, 
and detergents. An amphipathic compound usually con- 
tains one or more hydrocarbons that are each covalently 
attached at one of their ends to the others and to one or 
more hydrophilic functional groups. When an amphi- 
pathic compound is added to an aqueous solution, it 
forms noncovalent, multimolecular complexes referred 
to as either micelles or bilayers. In these complexes, all of 
the hydrophilic functional groups of the constituent mol- 
ecules reside on the surface at the interface or interfaces 
with the aqueous phase so that they can be hydrated by 
the water. The hydrocarbon occupies the interior of the 


complex sequestered from the aqueous phase by the 
hydrophobic effect. The molar volume of the hydro- 
carbon in the complex is determined simply by the par- 
tial molar volume of the hydrocarbon from which it is 
composed. 

The final molar surface area at an interface of the 
complex with the aqueous phase, however, is deter- 
mined by the balance between two opposing forces "7 
The hydrophilic functional groups have an inescapable 
atomic cross-sectional area for their covalent structure 
that forces them to be spaced at least a minimum dis- 
tance apart on the surface of the complex. This spacing is 
increased by the layers of hydration that are noncova- 
lently associated with each hydrophilic functional group 
and any mutual electrostatic repulsion driving them 
apart. The farther apart the hydrophilic functional 
groups are spaced to relieve these repulsive forces, the 
more of the hydrocarbon to which they are covalently 
attached is drawn out to the surface to come in contact 
with water. This exposure of the hydrocarbon to water is 
resisted by the hydrophobic effect. The balance between 
the intermolecular repulsion among the hydrophilic 
functional groups and the hydrophobic effect deter- 
mines the ultimate molar surface area of the complex. 

There is a third geometric constraint on the com- 
plex. Because every hydrocarbon is covalently attached 
to one or more hydrophilic functional groups, and every 
hydrophilic functional group must remain in contact with 
the aqueous phase, no carbon in the interior of the com- 
plex can be located more than a maximum distance from 
the aqueous phase. If the hydrocarbon were fully 
extended linear alkane with all of its carbon-carbon 
bonds in the all-trans conformation, that maximum dis- 
tance would be the maximum length of the amphipathic 
molecule. A certain amount of the hydrocarbon, however, 
having been dragged out of the interior by the repulsive 
forces among the hydrophilic functional groups, is 
required to occupy the surface of the complex, and the 
hydrocarbon in the interior is rarely fully extended 
because it is fluid and because it must mix to fill the inte- 
rior. Therefore, the maximum distance any carbon can be 
from the aqueous phase is significantly less than the max- 
imum length of the fully extended hydrocarbon found in 
the amphipathic molecule. One dimension of the com- 
plex formed from molecules of the amphipathic com- 
pound must always be less than or equal to twice this 
maximum distance. If it were not, the complex would 
contain a region farther from the aqueous phase than any 
matter can be located. 

The dimensions of the complex that an amphi- 
pathic compound can form are dictated by this maxi- 
mum dimension. The shapes available are a sphere the 
radius of which is less than or equal to the maximum 
dimension; an ellipsoid of revolution, prolate or oblate, 
the minor axis of which is less than or equal to the maxi- 
mum dimension; a cylindrical rod of indefinite length 
the diameter of which is less than or equal to the maxi- 


mum dimension; a bilayer of indefinite area the width of 
which is less than or equal to the maximum dimension; 
or cylindrical rods, ellipsoids of revolution, or spheres of 
water embedded uniformly in a volume otherwise filled 
with the amphipathic compound and spaced such 
that no distance between two adjacent surfaces of 
these aqueous inclusions is greater than the maximum 
dimension. 

The choice among these different geometric alter- 
natives in a given situation is determined by the ratio 
between the molar surface area, which is determined 
independently by the repulsion among the hydrophilic 
functional groups and the hydrophobic effect, and the 
molar volume, which is determined by the molecular 
structure of the particular amphipathic compound. The 
hydrocarbon of the compound must also be able steri- 
cally to fill the volume allotted to it by a particular 
shape;** certain volumes are too anisotropic to be filled 
by real hydrocarbon, which is made from atoms of 
carbon and hydrogen joined by covalent bonds of pre- 
cise bond angle and bond length and around which only 
particular rotations are permitted, even though these 
same volumes can be filled with imaginary hydrocarbon, 
which is a uniform continuum that can fill any shape 
drawn on a sheet of paper. 

Several examples will illustrate the outcome of the 
competition among the various free energies. When one 
of the fatty acyl chains is removed from phosphatidyl- 
choline to form lysophosphatidylcholine, the product 
forms ellipsoidal micelles” rather than bilayers because 
the internal volume of a bilayer about half the width of 
that formed by phosphatidylcholine that would be dic- 
tated by the appropriate molar surface area and molar 
volume cannot be filled uniformly by the hydrocarbon 
available, but an ellipsoidal micelle, with its smaller and 
more isotropic volume for each unit of surface area, can 
be filled readily. When suspended in 6% ethanol in water 
at 40 °C, distearoylphosphatidylcholine forms a phase in 
which the alkyl chains of its fatty acids interdigitate, 
rather than butting up against each other, to produce a 
narrower bilayer.” The ethanol promotes the 
increased exposure of the hydrocarbon to the water 
required by the increased cross-sectional area for each 
phospholipid. Dodecyl sulfate in 0.3 M lithium chloride 
forms spherical micelles because the electrostatic repul- 
sion among the sulfates is sufficient to produce the 
largest possible ratio of molar surface area to molar 
volume and the linear, fully saturated hydrocarbon is 
flexible enough to fill the appropriate spherical volume 
uniformly.’ 

A heterogeneous mixture of phospholipids 
extracted from mammalian brain, when hydrated at 
37 °C to alow content of water (<20%), forms a reversed 
hexagonal phase!” in which parallel cylinders of water 
spaced 4.5 nm apart are embedded in a volume other- 
wise filled with phospholipid.'*' In this case, the 
poorly hydrated hydrophilic functional groups of the 
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phospholipids produce such a small molar surface area 
that when combined with the unavoidable molar volume 
of the acyl groups, it would produce a bilayer wider than 
the maximum dimension permitted these phospho- 
lipids. Pure phosphatidylethanolamine, presumably 
because its head group is more compact than that of 
phosphatidylcholine, forms bilayers under certain cir- 
cumstances and reversed hexagonal phases under other 
circumstances. The reverse hexagonal phase of phos- 
phatidylethanolamine becomes more stable relative to 
the bilayer as the temperature is raised, the length of the 
fatty acyl groups is increased, the unsaturation of the 
fatty acyl groups is increased, or when the fatty acyl 
groups are branched.’’® All of these alterations increase 
the cross-sectional area of the hydrocarbon and favor the 
reversed hexagonal phase with its lower ratio of surface 
area in contact with water to mean cross-sectional area 
for hydrocarbon. 

The fact that the particular phospholipids synthe- 
sized by living organisms form bilayers spontaneously 
rather than one of these other structures is as much a 
result of evolution by natural selection as the fact that the 
polypeptides synthesized by living organisms happen to 
fold. 
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The Proteins 


Even a homogeneous suspension of biological mem- 
branes, highly purified by a series of centrifugations so 
that only identical membranes from the same source 
within the cells are present, contains a diverse collection 
of proteins. These proteins fall into several categories. All 
membranes when they are present in the cell are closed 
continuous sacs, containing solutions of soluble pro- 
teins. Even if the final suspension of purified membranes 
has been submitted to lysis and centrifugation, 
entrapped soluble proteins may still be enclosed in 
small vesicles of membrane and contaminate the prepa- 
ration. For example, it is difficult to obtain from erythro- 
cytes a suspension of plasma membranes completely 
devoid of hemoglobin. 

Peripheral membrane-bound proteins'" are pro- 
teins that are not physically embedded in the bilayer of 
phospholipid but are associated with the membrane 
either through interfaces with proteins that are embed- 
ded in the bilayer or through superficial interactions with 
the bilayer of phospholipid of the membrane. Peripheral 
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membrane-bound proteins can be dissociated by treat- 
ments that do not dissolve the bilayer of the membrane. 
If they are associated with more firmly attached proteins, 
they can often be removed from the membrane by mild 
treatments that are normally used to dissociate the sub- 
units ofmultimeric proteins. Examples ofsuch treatments 
are increasing or decreasing either the pH or the ionic 
strength, removing divalent cations, using mild denatu- 
rants, or some combination of these treatments. (OTI For 
example, the cytoskeletal proteins spectrin and actin can 
be released from the plasma membranes of erythrocytes 
by chelating divalent cations.’ 

Some peripheral membrane-bound proteins asso- 
ciate directly with the head groups of the phospholipids 
in a bilayer. For example, in the presence of Ca”, isoform 
V of annexin associates at diffusion-controlled rates (1 x 
10'° M™ elt with large unilamellar vesicles of phospho- 
lipid”? and releases from these vesicles rapidly upon 
chelation of the Ca”. In contrast to annexin, which dis- 
plays no preference for the head group of the phospho- 
lipid,'” the peripheral association of protein kinase C 
with vesicles of phospholipid requires that they contain 
significant concentrations of phosphatidylserine,'” and 
the dissociation constant between the enzyme and the 
phospholipid bilayer decreases as the concentration of 
1,2-diacylglycerol in the membranes is increased. "^ 
That the interaction of protein kinase C is mainly with 
the hydrophilic surface of the bilayer is suggested by the 
fact that the interaction requires the naturally occurring 
enantiomers of both the phosphatidylserine and the di- 
acylglycerol.’*° That no association occurs with bilayers 
formed from the unnatural enantiomers demonstrates 
that it is not the surface charge of the bilayer that is being 
recognized by protein kinase C but the head groups 
themselves. The pleckstrin domain of phospholipase C 
binds specifically to one molecule of phosphatidylinosi- 
tol 4,5-bisphosphate within a bilayer of phospholipid,'”’ 
and prothrombin and protein Z both also seem to bind to 
the head group of only one of the phospholipids in a 
bilayer." 

In contrast to these proteins that recognize the 
head groups of the phospholipids specifically, the 
peripheral association of choline-phosphate cytidylyl- 
transferase with a bilayer of phospholipid requires only 
that its surface charge be negative because any negative 
phospholipid promotes its binding.” 

An anchored membrane-bound protein is a pro- 
tein a portion of whose primary covalent structure, unin- 
volved in its function, is immersed within the 
hydrocarbon of the bilayer of phospholipid and serves 
only to anchor the protein to the membrane. The portion 
of the protein embedded in the bilayer is not engaged in 
the native structure of the globular domain or globular 
domains to which it is covalently joined and can often be 
removed endopeptidolytically or by genetic manipula- 
tion to produce a protein that is soluble and that still dis- 
plays all of the functions of the membrane-bound form. 


Often the embedded portion in an anchored mem- 
brane-bound protein is a short segment of polypeptide at 
its amino terminus or carboxy terminus that appears to 
have been tacked on to an otherwise soluble protein to 
confine it to the surface of the membrane. 
Carboxypeptidase E is attached to the membranes of 
secretory granules from the adrenal medulla through an 
amphipathic «helix (see Figure 6-8) about 20 aa in 
length at its carboxy Terminus IT The hydrophobic sur- 
face of this œ helix is submerged in the hydrocarbon 
of the bilayer of phospholipid. A more common type of 
anchor, however, is a segment of polypeptide at one of 
the termini that spans the bilayer of the membrane. For 
example, bovine polypeptide N-acetylgalactosaminyl- 
transferase has the amino-terminal sequence MRKFAY 
CKVVLATSLIWVLLDMFLLLYFSECNKCDEKKER-.."” The 
segment from Valine 9 to Phenylalanine 28 spans the 
bilayer of phospholipid that forms one of the Golgi mem- 
branes to which the protein is anchored, and the rest of 
the protein is a normal water-soluble, globular struc- 
ture.’ This enzyme is a member of a large group of gly- 
cosyltransferases, each of which is anchored to the Golgi 
membranes.” Cytochrome c; is a cytochrome that is 
anchored in the mitochondrial membrane by a mem- 
brane-spanning segment at its carboxy terminus. When 
this segment is removed, a completely soluble form of 
the cytochrome is produced that is functionally intact.’** 
Cytochrome cı, however, in addition to possessing the 
carboxy-terminal anchor, is also, under normal circum- 
stances, a subunit of ubiquinol-cytochrome-c reductase, 
a large, heterooligomeric, membrane-spanning com- 
plex. 

Many proteins are directed to certain organelles in 
the cell by amino-terminal signal sequences. The signal 
sequence directing bovine dopamine-ß-monooxygenase 
to secretory granules of the adrenal medulla is a 
hydrophobic segment 20 aa in length, and if the signal 
sequence is not removed as it normally is, it becomes a 
membrane-spanning segment anchoring this protein in 
the bilayer of phospholipid.’ 

Sometimes a separate domain at one of the termini 
of a protein is responsible for anchoring it within the 
bilayer of phospholipid. For example, the domain of 
60 aa at the amino terminus of the receptor Tom 20 is an 
anchor embedded in the outer mitochondrial mem- 
brane,'**!%’ and the domain of 103 aa at the carboxy ter- 
minus of 3-hydroxybutyrate dehydrogenase’® is an 
anchor embedded in the inner mitochondrial mem- 
brane; each of these domains anchors the otherwise 
soluble protein into its respective membrane. 
Hydroxymethylglutaryl-CoA reductase, which is an 
anchored membrane-bound protein by virtue of the fact 
that a detachable domain with full catalytic activity can 
be removed from the membrane by endopeptidolytic 
cleavage, has an embedded anchor almost 400 amino 
acids in length containing seven hydrophobic segments 
of greater than 20 aa each." 


The segment of the polypeptide anchoring a pro- 
tein in a bilayer of phospholipid can also be in its inte- 
rior.“ The amino acid sequence between Methionine 
177 and Alanine 229 in (S)-mandelate dehydrogenase 
from Pseudomonas putida, a protein 393 aa in length, 
anchors it in the plasma membrane of the bacterium. 
When this segment is replaced by sequence 20 aa in 
length that occupies the homologous location in 
(S)-2-hydroxy-acid oxidase, a closely related but com- 
pletely soluble protein, the resulting chimera no longer 
associates with the membrane, is fully active, and can be 
readily crystallized." The three respective interior 
segments of coagulation factor VIII? and coagulation 
factor V!* responsible for anchoring each of them in a 
bilayer of phospholipid form loops directing hydropho- 
bic side chains into the hydrocarbon of the membrane. 

Proteins that have been posttranslationally modi- 
fied with glycosylphosphatidylinositol (Figure 3-17) are 
anchored in their respective membranes by the covalently 
attached phosphatidylinositol, which spontaneously 
takes its place within the bilayer of phospholipid.'“°The 
set of glycosylphosphatidylinositol-linked (GPI-linked) 
proteins is a heterogeneous collection. Their respective 
functions are unrelated to each other, so their common 
mode ofattachmentis probably fortuitous. They are, how- 
ever, all inserted into the extracytoplasmic surfaces of the 
plasma membranes of the respective cells in which they 
are found. Immediately after they have been synthesized, 
these proteins are anchored temporarily in the membrane 
by a carboxy-terminal segment of their polypeptide that 
is rich in hydrophobic amino acids. The ultimate carboxy 
terminus of the posttranslationally modified protein is a 
glycine, alanine, cysteine, serine, or asparagine 15-30 
amino acids in from the initial carboxy terminus. Through 
an enzymatically catalyzed transamidation, the carboxy 
terminus of this amino acid is transferred from the car- 
boxy-terminal segment of 15-30 amino acids to which it 
was Originally attached to the amine of the ethanolamine 
phosphate connected through the oligosaccharide to the 
phosphatidylinositol (Figure 3-17). 

A glycosylphosphatidylinositol-linked protein can 
be identified by its ability to be released from the surface 
of the cell by glycosylphosphatidylinositol diacylglyc- 
erol-lyase'“® or by site-directed mutation.'”” Once 
released, the resulting globular, soluble protein can be 
purified and crystallized.'””'°° In fact, glycosylphos- 
phatidylinositol-linked proteins are often isoforms from 
species of proteins the other members of which are 
water-soluble and make no contact with membranes. 
Examples of such glycosylphosphatidylinositol-linked 
proteins are the variant surface glycoprotein on the exte- 
rior of cells of Trypanosoma brucei, ™ the receptor for the 
F. domain of immunoglobulin G on the exterior surface 
of neutrophilic lymphocytes, "> one isoform of acetyl- 
cholinesterase on the exterior surfaces of several types of 
animal cells, III isoform IV of carbonate dehydratase 
on the exterior surfaces of cells from lung and 
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kidney,” and the cell surface glycoprotein a-2 on the 


exterior surfaces of hematopoietic cells.'°*'° Sometimes 
two isoforms of one of these proteins, the glycosylphos- 
phatidylinositol-linked isoform and an isoform anchored 
to the membrane by a carboxy-terminal membrane- 
spanning segment of polypeptide, are produced in the 
same tissue. 1° 

Each of the various types of anchor is more or less 
firmly embedded in the bilayer of phospholipid. The 
segments of polypeptide at the amino or carboxy termini 
that span the bilayer, either once or several times, are 
permanently affixed to it. Amphipathic o helices, either 
at one of the termini” or in the middle of the protein,” 
are much less firmly embedded and can be dissociated 
under appropriate circumstances.'” Proteins that dip 
loops of their polypeptide in the bilayer of phospholipid 
are also less firmly embedded, display requirements such 
as negative surface charge for competent association, 
and have measurable dissociation constants.’ A protein 
with an anchor of glycosylphosphatidylinositol is perma- 
nently embedded in the membrane because its cova- 
lently attached phosphatidylinositol, with two fatty acids 
of the normal length, is so hydrophobic that its dissocia- 
tion constant from the bilayer of phospholipid is immea- 
surably small.” Proteins posttranslationally modified by 
isoprenylation, however, because they have only one 
covalently attached hydrocarbon (Figure 3-16), are less 
firmly anchored in the membrane. Those with a geranyl- 
geranyl modification (C. H33) have dissociation con- 
stants'®'# for a bilayer of phospholipid of 0.1-40 uM; 
and those with a farnesyl modification (C5H5), 
1-150 uM. The particular value of the dissociation con- 
stant depends on the number of basic amino acids in the 
carboxy-terminal sequence of the protein and the sur- 
face potential of the bilayer of phospholipid. 

Proteins that have been posttranslationally modi- 
fied by acylation with fatty acids such as palmitic acid, 
oleic acid, and stearic acid (Table 3-1)!® are usually 
bound to the cytoplasmic surface of the plasma mem- 
brane.'® They are bound tightly because these proteins 
usually have several sites of fatty acylation. In some 
cases, however, it is unclear whether such fatty acylation 
is directed almost exclusively to an otherwise mem- 
brane-bound protein after it has associated irreversibly 
with the membrane or the fatty acylation itself promotes 
the association of the protein with the membrane. For 
example, proteolipid protein from myelin is a mem- 
brane-bound protein with six cysteines in its amino acid 
sequence that are fatty acylated, but it also has several 
membrane-spanning segments.'*’ The cytochrome sub- 
unit of the photosynthetic reaction center from 
Rhodopseudomonas viridis, however, has a diglyceride in 
ether linkage with its amino-terminal cysteine that does 
seem to be the only portion of the protein anchoring it in 
the membrane.'™ 

Proteins that are posttranslationally modified by 
myristoylation of their amino termini (tetradecanoyl in 
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Figure 3-16) may or may not associate with membranes. 
Whether or not they do seems to depend on their amino- 
terminal sequence and the accessibility of the myristate. 
The amino-terminal segments of several normally 
N-myristoylated proteins associate with bilayers of phos- 
pholipids even when they are not myristoylated'®'” 
either by forming an amphipathic o helix or by accumu- 
lating at membranes with negative surface potential.'” 
Other N-myristoylated proteins, however, lose their abil- 
ity to associate with the membrane when they are not 
myristoylated.'” Yet other proteins either associate with 
membranes or fail to associate depending on whether or 
not their myristoyl groups are fully exposed on their sur- 
face or buried in their interior.’ The variability in 
behavior is probably the result of the fact that the disso- 
ciation constant for just a myristoylated amino terminus 
from a bilayer of phospholipid is only about 0.1 uM.!”* 

Many enzymes catalyze reactions in which phos- 
pholipids, steroids, other membrane-bound proteins, or 
other molecules embedded in a membrane are sub- 
strates. To find their substrates, these enzymes must 
associate with the membrane in which those substrates 
are located. Such enzymes often bind tightly to those 
membranes and process their substrates by scooting 
across their surfaces." They can be firmly anchored 
in the membranes with which they associate by embed- 
ding one or more segments of their polypeptides in the 
bilayer of phospholipid.'”® For example, the polypeptide 
between Isoleucine 50 and Asparagine 98 of prosta- 
glandin-endoperoxide synthase forms four short 
æ helices that are embedded in one of the monolayers of 
the bilayer of amphipathic lipids'“° forming the mem- 
brane. The membrane to which it is anchored contains 
the phospholipids to which are acylated the arachi- 
donoyl groups that are the substrates for this enzyme. 
Enzymes that must move from one membrane to 
another to perform their function, however, are only 
loosely associated with the respective bilayers of phos- 
pholipids. For example, isoform 2 of sterol carrier protein 
associates desultorily with membranes by an amino-ter- 
minal segment forming two amphipathic o helices.’” 
Enzymes that catalyze reactions with membrane-bound 
substrates are often anchored in the membrane only by 
a portion of their structure uninvolved in the actual 
catalysis. Unlike anchored membrane-bound proteins, 
however, they would be unable to perform their function 
were the anchor to be removed and they were no longer 
able to associate with a membrane, because they would 
be unable to find their substrates. 

An integral membrane-bound protein is a protein 
a portion of the polypeptide of which is permanently 
embedded in the bilayer of phospholipid constituting its 
membrane, and that portion of its polypeptide within 
that membrane is essential for its function. There is a 
family of proteins, exemplified by the receptor for epi- 
dermal growth factor,'®°'®' the members of which con- 
tain only one short hydrophobic segment of their 


polypeptide embedded within membrane. This segment 
is in the middle of the amino acid sequence?" and 
spans the bilayer of phospholipid once. On the two ends 
of this membrane-spanning segment, there are globular 
domains on the cytoplasmic and extracytoplasmic sides 
of the membrane. The role of these proteins is to trans- 
mit across the membrane to their cytoplasmic domains 
the information that a circulating hormone is bound to 
their extracytoplasmic domains. Upon receipt of the 
information, a protein tyrosine kinase, catalyzed by the 
cytoplasmic domain, is activated. Neither domain by 
itself is capable of displaying hormone-dependent pro- 
tein tyrosine kinase activity,” so in this case, the single 
membrane-spanning segment performs a much greater 
role than merely anchoring the protein in the membrane. 

Usually, however, a significant portion of the native 
structure of the polypeptide or polypeptides of an inte- 
gral membrane-bound protein!'” is within the hydro- 
carbon of the bilayer of amphipathic lipids forming the 
membrane. Integral membrane-bound proteins can 
never be detached in a functional form from the bilayer 
by endopeptidolytic cleavage or site-directed mutation 
and often lose their native structure or precipitate from 
solution or both when the bilayer is completely dissolved 
by detergents. As with any protein, the native structure of 
the portion of an integral membrane-bound protein that 
is within the membrane is determined by the solvent in 
which it is dissolved. This solvent is a sheet of liquid 
paraffin about 3.6nm wide possessing covalently 
attached oligosaccharides, anions, and zwitterions at 
both of its surfaces and in contact on each of its surfaces 
with an aqueous solution. 

Examples of integral membrane-bound proteins 
that are required by their function to have significant 
portions of their mass within the hydrocarbon itself are 
proteins that form channels through the membrane for 
the transport of polar, water-soluble metabolites across 
the membrane, proteins that catalyze the active trans- 
port of metallic cations and metabolites against their 
gradients of concentration across the membrane, large 
complexes of subunits that catalyze electron transport 
and the concomitant active transport of protons across 
the membrane, and proteins that change their confor- 
mation or oligomeric state upon the reception of infor- 
mation to transfer that information across the 
membrane. Integral membrane-bound proteins can also 
be enzymes the substrates for which are dissolved in and 
confined to the membrane.'® 

Integral membrane-bound proteins vary in size 
from diacylglycerol kinase of E. coli, with a single folded 
polypeptide 121 aa in length,'® to the ryanodine recep- 
tor of animal sarcoplasmic reticulum, with a single 
folded polypeptide about 5000 aa in length,'®° or NADH 
dehydrogenase (ubiquinone), a complex of 45 different 
subunits for a total of about 9000 aa 

The distinction between an anchored membrane- 
bound protein and an integral membrane-bound protein 


is not a clean one. For example, a membrane-bound pro- 
tein known as glycophorin is found in the plasma mem- 
brane of erythrocytes. The protein is 131 amino acids 
long, and its embedded anchor" is located to the car- 
boxy-terminal side of Glutamate 72. This embedded 
anchor spans the plasma membrane.'®’ The 35 carboxy- 
terminal amino acids on the cytoplasmic side of the 
membrane are rich in proline and seem to be structure- 
less and functionless, simply acting as a barb that cannot 
be pulled across the membrane. The protein is about 
60% carbohydrate by weight," and this carbohydrate is 
entirely linked as oligosaccharides’? to the extracyto- 
plasmic amino-terminal portion of the protein through 
15 O-glycosidic linkages to threonines and serines and 
one N-glycosidic linkage.'** The function of glycophorin 
is to serve as the source of most of the oligosaccharide on 
the extracytoplasmic surface of the erythrocyte, so its 
function would be lost if its anchor were removed. 

Membrane-bound proteins are inserted into natu- 
ral membranes such that every copy of the same protein 
is oriented in the same direction relative to the cyto- 
plasm. The earliest observations addressing this point 
explicitly confirmed this assumption of vectorial inser- 
tion.” For example, nucleophilic amino acids in 10 
of the thermolytic peptides on the peptide map of a 
digest of band3 anion transport protein from human 
erythrocytes could not be modified with N-formyl- 
[*°S]sulfinylmethionyl methylphosphate, a polar reagent 
that cannot pass through an intact membrane, when the 
native protein was in sealed, intact erythrocytes, even 
though they could be readily modified when the erythro- 
cytes were broken open.'”' The explanation of this obser- 
vation is that every copy of anion carrier is oriented the 
same way in the membrane, each presenting the same 
unique surface to the cytoplasmic space of the cell as well 
as a different, also unique face to the extracytoplasmic 
space, and that the cytoplasmic surface is inaccessible to 
the impermeant reagent in an intact cell. Since these 
early studies, many examples of vectorial insertion have 
been verified, and no example of a membrane-bound 
protein the copies of which are oriented at random in a 
natural membrane has been verified. 

A property related to the vectorial insertion of every 
protein in a biological membrane is the asymmetric dis- 
tribution of the oligosaccharides on the glycoproteins 
and glycolipids embedded in the plasma membranes of 
cells. Almost all” of the oligosaccharide bound to the 
plasma membrane of an animal cell is located upon its 
extracytoplasmic surface.'”*'” This feature is a corollary 
of the fact that almost no!” glycoproteins are found in 
the cytoplasm, only in extracytoplasmic spaces, and a 
direct result of the fact that the glycosyltransferases that 
synthesize the oligosaccharides on these glycoproteins 
are in the extracytoplasmic lumens of the Golgi mem- 
branes. Membrane-bound proteins are synthesized on 
ribosomes bound to the endoplasmic reticulum and 
incorporated in their proper orientation by the machin- 
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ery responsible for their insertion into the membranes of 
the endoplasmic reticulum. The membrane in which 
they have been incorporated is then sent to the Golgi 
membranes where the oligosaccharides are added. Then 
vesicles, which bud off the Golgi membranes and which 
maintain the vectorial orientation of the membrane- 
bound proteins and their attached oligosaccharide, 
transport them to the plasma membrane. These vesicles 
then fuse with the plasma membrane so that their extra- 
cytoplasmic surfaces, on which the membrane-bound 
oligosaccharides reside, and their extracytoplasmic 
lumens, in which the soluble glycoproteins are located, 
remain extracytoplasmic. 

In addition to glycosylation, membrane-bound 
proteins are posttranslationally modified as often as are 
water-soluble proteins. For example, they can be modi- 
fied to contain covalently attached coenzymes or they 
can be phosphorylated. 

When a suspension of purified membranes is 
examined by electrophoresis in solutions of dodecyl sul- 
fate, a large collection of different polypeptides, each 
present at its own characteristic concentration, is 
observed (Figure 14-12).'” Each of these polypeptides is 
a component of one of the many native proteins bound 
to the membranes, and the protein it constitutes is 
responsible for a particular biochemical function. 
Therefore, a biological membrane, although often much 
less complex, resembles cytoplasm in being a heteroge- 
neous solution of a large number of different proteins, 
each present at a different concentration and each witha 
specific function. Many of the functions performed by 
membrane-bound proteins have been identified, and 
biochemical assays have been developed for determin- 
ing their presence and their concentration. In many 
instances, the protein responsible for one of these func- 
tions has been identified and purified, and its cDNA has 
been cloned and sequenced. 

Because membrane-bound proteins are often more 
difficult to purify than soluble proteins, indirect proce- 
dures are often used to identify the gene encoding them 
to assist in their purification. Classical genetics can be 
used with bacteria and fungi to identify the gene encod- 
ing a membrane-bound protein responsible for a partic- 
ular function.'”®?" It is also possible to select a cDNA 
encoding a membrane-bound protein that is responsible 
for a particular function, such as the transport of a par- 
ticular metabolite”” or a change in the conductance of 
the membrane in response to a neurotransmitter,“ by 
screening the expression of a library of cDNA in oocytes 
of Xenopus laevis. If a cDNA has been identified before a 
membrane-bound protein has been purified, that cDNA 
can be modified to assist in its purification. For example, 
the protein can be expressed with a sequence of his- 
tidines at its carboxy terminus to permit its purification 
by affinity adsorption.” If the protein is an anchored 
membrane-bound protein, a site for endopeptidolytic 
cleavage can be inserted to ensure that the protein can be 
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Figure 14-12: Polyacrylamide gels displaying the collection of 
polypeptides found in the plasma membranes of erythrocytes from 
R. norvegicus (A), the plasma membranes of cells from liver of 
R. norvegicus (B), and the portion of the plasma membrane of 
kidney cells from R. norvegicus referred to as the brush border 
(©).'” Purified membranes from these various tissues were dis- 
solved in a solution of sodium dodecyl sulfate, which unfolded and 
coated each polypeptide with a layer of dodecyl sulfate. The 
polypeptides were then separated by electrophoresis on gels of 
polyacrylamide cast in a solution of sodium dodecyl sulfate. Each 
band represents a different polypeptide. The dashes indicate 
polypeptides that are glycosylated. Reprinted with permission 
from ref 197. Copyright 1971 Journal of Biological Chemistry. 


released from the membrane by digestion.” In most 
cases, however, a membrane-bound protein of interest is 
purified before the cDNA for it is available. 

The purification of the particular membrane- 
bound protein identified by a biochemical assay pro- 
ceeds in two stages. In the first stage, a biological source 
is chosen that contains the highest possible concentra- 
tion of the protein. This involves assaying the biochemi- 
cal activity in different tissues from different species or 
trying to increase the concentration of the protein in its 
active form by genetic manipulation of a microorganism 
or cultured eukaryotic cells. Membranes that contain the 
protein in high concentration are then separated by the 
procedures of cell fractionation from other membranes 
in the biological source that contain little or none of the 
protein. These purified membranes are lysed to release 
the entrapped soluble proteins and submitted to treat- 
ments that release peripheral membrane-bound pro- 
teins without inactivating or releasing the protein of 


interest. The product of these manipulations is a suspen- 
sion of membranes in which are embedded the protein 
of interest in the highest possible concentration. At the 
completion of this first stage, the membrane-bound pro- 
tein being purified may be essentially homogeneous. For 
example, fragments of membrane the only protein of 
which (90%)? is Na*/K*-exchanging ATPase can be puri- 
fied” from a region of the mammalian kidney, the only 
function of which is to transport sodium and potassium. 
In the kidney these membranes are paved with this pro- 
tein. Many of the membrane-bound proteins that have 
been purified to homogeneity are those that are already 
present at high density in such suspensions of appropri- 
ately purified and extracted membranes. 

Once membranes enriched in a particular protein 
have been obtained, the next step in its purification is to 
release that protein from them. The few proteins that are 
linked to the membrane by glycosylphosphatidylinositol 
can be released by glycosylphosphatidylinositol diacyl- 
glycerol-lyase and then purified as water-soluble pro- 
teins. Anchored membrane-bound proteins can often be 
released from a membrane by mild endopeptidolytic 
digestion while retaining their full biological activity. The 
majority of membrane-bound proteins, however, cannot 
be released from the membrane so easily, and the second 
stage in their purification is usually to dissolve the mem- 
branes without unfolding the protein and then purify the 
dissolved protein as if it were a soluble protein. A non- 
ionic or zwitterionic detergent is used to dissolve the 
membranes, because nonionic or zwitterionic deter- 
gents bind only to the hydrophobic membrane-spanning 
portions of integral membrane-bound proteins and are 
unable to bind tightly all along a polypeptide and unfold 
it as does dodecyl sulfate.°’” Ionic detergents, such as 
those used to wash laundry, are much harsher than non- 
ionic detergents, which are used for washing dishes by 
hand or for shampoos. 

Nonionic or zwitterionic detergents are amphi- 
pathic compounds that have neutral or zwitterionic 
hydrophilic functional groups attached to one end of 
their hydrocarbon. A common class of nonionic deter- 
gents, the alkyl oligo(ethylene oxide) ethers (Brij series), 
is synthesized from linear, saturated primary alcohols 
produced commercially by the reduction of linear fatty 
acids 12-20 carbons in length. Because these fatty acids 
are usually from biological sources, the alcohols usually 
have an even number of carbons. Ethylene oxide is 
polymerized at random to the hydroxyl of one of these 
alcohols to form the detergent CH;(CH;)„CH>0(CH3 
CH;0),H. The length of the alcohol (m + 2) is defined by 
the synthesis, but when the detergent is prepared for 
commercial use, the ethylene oxide is simply polymer- 
ized at random to produce a random mixture of 
hydrophilic extensions of different lengths within the 
same batch of synthetic detergent. As demand grew for 
structurally homogeneous detergents, the mixtures of 
these random polymers were separated chromatograph- 


ically into their pure components to produce detergents 
such as n-Cy.H»,;0(CH»CH2O)3H (abbreviated Cube) and 
n-CgH,7O(CH,CH,0),H (abbreviated C,E,). Another 
series of structurally heterogeneous oligo(ethyleneoxide) 
detergents are the Tritons: 
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A broad class of nonionic detergents each member 
of which can be synthesized directly in pure form are the 
alkyl glycosides. A pure saccharide such as glucose or 
maltose is coupled synthetically to a pure long-chain 
alcohol such as octanol, decanol, or dodecanol in a gly- 
cosidic linkage at the carbonyl carbon of the saccharide. 
An example of such a structurally homogeneous deter- 
gent would be decyl ß-p-maltoside: 
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The exclusive coupling of the alcohol to the only car- 
bonyl carbon of the saccharide in acetal linkage permits 
the direct synthesis of a detergent that is structurally 
homogeneous except at the anomeric carbon, and the 
two anomers can then be separated chromatographi- 
cally. Either glucose or maltose can be chosen as the 
hydrophilic group and an alcohol of length between 8 
and 14 carbons can be chosen as the hydrophobic group 
to generate a wide selection of different detergents. A set 
of related, naturally occurring detergents are the 
saponins, which are glycosides of triterpenes such as 
oleanolic acid. In the saponins, monosaccharides, disac- 
charides, and trisaccharides are coupled in acetal linkage 
to hydroxyls and carboxylates on the triterpene to pro- 
duce biosynthetically a dramatically heterogeneous mix- 
ture.” 

A structurally homogeneous class of zwitterionic 
detergents that can be synthesized directly are the 
N-oxides of linear dimethylalkyl amines such as 
N,N-dimethyldodecylamine N-oxide: 
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3-[(3-Cholamidopropyl) dimethylammonio]-1-propane- 
sulfonate (CHAPS) 


C H HsCCHs 
H3 CH3 N 
D O 
HO 
OH de 
OH SO 


14-15 


is a zwitterionic detergent that is synthesized from cholic 
acid, which itself is a mild ionic detergent, as is 7-deoxy- 
cholic acid. 

Each of these detergents, the pure and the impure, 
because it contains only a single amphipathic compound 
or is a mixture of several related amphipathic com- 
pounds, forms elliptical micelles when it is dissolved in 
water. If the detergent is pure, however, the micelles that 
it forms in aqueous solution are of a uniform size (Table 
14-5). Each detergent has a critical micelle concentra- 
tion (Table 14—5). When its concentration is below the 
critical micelle concentration, the detergent is present in 
solution as free, independent molecules; and when its 
concentration is above the critical micelle concentration, 
there is a mixture of free molecules of detergent at the 
critical micelle concentration and micelles of detergent 
accounting for the excess over the critical micelle con- 
centration. As the concentration of detergent is 
increased above the critical micelle concentration, the 
concentration of free detergent remains constant while 
the concentration of micelles increases. 

At high enough concentrations, a nonionic deter- 
gent in aqueous solution is able to form mixed micelles 
with phospholipids and cholesterol and thereby dis- 
solve membranes.’!”?!? There are several stages in this 
dissolution of a bilayer of phospholipid by a nonionic 
detergent." At low concentrations of free detergent, 
below its critical micelle concentration, there is a parti- 
tion coefficient governing the distribution of detergent 
between the bilayer of phospholipid and the water, the 
membranes remain intact, and the detergent incorpo- 
rates into the bilayer just as if it were an amphipathic 
lipid.” As the concentration of detergent is increased, its 
incorporation into the intact membrane begins to satu- 
rate, the apparent partition coefficient begins to drop, 
and the permeability of the membranes abruptly 
increases, as if fissures or pores were opening, but the 
bilayer of phospholipid still remains intact.”!’*’* As the 
concentration of the detergent is increased even further, 
the amount of bound detergent suddenly increases dra- 
matically and the membranes dissolve and are replaced 
by mixed micelles of detergent and phospholipid. 

These mixed micelles are completely formed before 
the free concentration of detergent reaches its critical 
micelle concentration in the absence of phospholipid 
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Table 14-5: Micelles of Nonionic and Zwitterionic Detergents 
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mean molar mass critical micelle 

aggregation of micelle concentration 
detergent number“ (e) mor’) (mM) 
n-CgH,70(CH,CH,0);H 32 11,000 6.0 
n-C9H»,0(CH»CH,O),H 76 32,000 0.46 
n-C 9H2,;0(CH,CH,O)3H 0.28 
n-C)H>;O(CH;CH3,0);H 105 50,000 0.065 
n-C)H>;O(CH;CH3,0O);H 120 65,000 0.056 
n-C 4H>90(CH,CH3;0);H 0.0052 
n-C gH330(CH»CH,O)3H 0.00047 
Triton X-100 140 90,000 0.21 
n-CgH,7N(CH3).0 200 
n-C oH2,N(CH3).0 7.5 
n-C 2H25N(CH;3)20 76 17,300 0.4 
n-decyl ß-D-maltoside 1.4 
n-dodecyl ß-D-maltoside 98 50,000 0.14 
n-octyl B-D-glucoside 84 25,000 25 
n-decyl ß-D-glucoside 4.2 
n-dodecyl ß-D-glucoside 0.14 
CHAPS (14-15) 4-14 6000 6.2 


“Mean aggregation number is the average number of molecules or detergent in a micelle. 


(Table 14-5), an observation suggesting that the critical 
micelle concentration ofthe mixed micelles is lower than 
that of pure micelles of detergent. The abrupt dissolu- 
tion of the bilayer of phospholipid to form these mixed 
micelles occurs at a fixed ratio of detergent to lipid rather 
than at a particular concentration of detergent,”’® an 
observation suggesting that there is an optimal ratio of 
detergent to phospholipid for the formation of these 
mixed micelles. An intermediate stage in some instances 
between a suspension of bilayers and a solution of ellip- 
tical mixed micelles seems to be the formation of long 
tubular micelles.?' 

When natural membranes containing membrane- 
bound proteins are dissolved with a nonionic detergent, 
the same stages are passed through, and ideally, if the 
removal of the bilayer of phospholipid by this process 
does not unfold the protein being purified, the nonionic 
detergent forms a toroidal micelle surrounding the seg- 
ments of the protein formerly embedded in the mem- 
brane and replacing the hydrocarbon of the bilayer of 
phospholipid with the hydrocarbon of the detergent. 
This toroidal micelle presents the hydrophilic functional 
groups of the detergent to the aqueous phase while its 
hydrocarbon surrounds and supports the hydrophobic 
portions of the protein formerly embedded in the mem- 
brane. The previously membrane-spanning segment or 
segments of polypeptide end up in the center of the 
toroid; the inner surface of the toroid is formed from the 
hydrocarbon of the detergent flush against the mem- 
brane-spanning segments of the protein, and the outer 
surface of the toroid is the hydrophilic portion of the 
detergent directed outward into the water (Figure 14-13). 
Such toroidal micelles have been crystallographically 


observed surrounding molecules of integral membrane- 
bound proteins crystallized from solutions of the protein 
produced with detergent.” 

As with phospholipids, there seems to be a required 
ratio between the detergent and the protein and lipid 
present in the original membrane for the protein to be 
dissolved completely.”! This minimum ratio of concen- 
trations is that required for there to be at least one 


za) phospholipid 


detergent 


Figure 14-13: Diagrammatic representations of the mechanism 
by which a nonionic detergent dissolves an integral membrane- 
bound protein. A toroidal micelle is formed within which the 
bilayer of phospholipids is replaced by the hydrocarbon of the 
detergent. Figure courtesy of Steven Clarke, Department of 
Chemistry and Biochemistry, University of California at Los 
Angeles. 


micelle of detergent, as measured in the absence of the 
membranes (Table 14-5), for each molecule of protein.’ 
Furthermore, there must be sufficient detergent in the 
solution to maintain its free concentration at a level 
equal to its critical micelle concentration in the absence 
of membranes (Table 14-5). 

One peculiarity of these mixed micelles of protein 
and detergent is that at ratios of detergent where the pro- 
tein is not completely dissolved and the solution con- 
tains micelles with one molecule of protein, micelles 
with two molecules of protein, micelles with three mole- 
cules of protein and so forth, these micelles neither fuse 
with each other nor dissociate into smaller aggre- 
gates.” **4 Once the ratio of these states of aggregation 
is established in the initial rapid dissolution, it remains 
fixed as long as the total concentration of detergent 
remains fixed. 

There are several drawbacks to the necessity of dis- 
solving the membranes in detergent to dissolve the pro- 
teins within them. The actual dissolution of the 
membrane occurs over a narrow range of free concen- 
tration of detergent,” so it is not possible to control the 
process very successfully. For reasons that are unknown, 
when ratios of detergent to protein increase beyond 
those needed to dissolve the membranes, the biological 
function of the protein is often impaired or disappears 
entirely.” Consequently, changes in the concentration 
of detergent or the ratio of detergent to protein that auto- 
matically occur during chromatography often lead to 
loss of activity. For this reason, it is often necessary to 
screen a large number of different detergents to dis- 
cover by trial and error the one that preserves the biolog- 
ical activity of the protein over the broadest possible 
range of concentrations of both protein and detergent. 
Fortunately, there are a large set of homogeneous non- 
ionic detergents to explore with a wide range of critical 
micelle concentrations (Table 14-5). 

Often it is difficult to find a detergent and the 
proper conditions to produce a monodisperse solution 
of the membrane-bound protein of interest without pro- 
ducing its inactivation,” yet a monodisperse solution of 
the protein is essential if chromatography by molecular 
exclusion is to be used to purify it. The large sizes of the 
micelles of the detergents limit the range of molecular 
sizes that can be separated by chromatography by 
molecular exclusion because no complex between a 
micelle and a molecule of protein can be smaller than the 
micelle itself. Many membrane-bound proteins are sialo- 
glycoproteins so their charge is microheterogeneous, a 
situation causing difficulties for chromatography by ion 
exchange. For all of these reasons, the purification of a 
membrane-bound protein is more difficult than that of a 
soluble protein. 

As with a soluble protein, a biochemical assay is 
used to follow the purification ofa membrane-bound pro- 
tein. Ifthe protein is an enzyme, its enzymatic activity can 
be measured in a standard assay.” ®?” If one of the sub- 
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strates for the enzyme is a phospholipid or another lipid 
incorporated in a membrane, the enzyme is reassociated 
with vesicles of phospholipid containing that substrate 
before assay. If the membrane-bound protein is 
responsible for transmitting across the membrane the 
information that a hormone is bound at its extracyto- 
plasmic surface, the binding of that hormone or of a syn- 
thetic agonist or antagonist that binds tightly to the site 
at which the hormone binds can be used as an assay for 
that protein.** If the membrane-bound protein is respon- 
sible for transporting a metabolite across the membrane, 
the binding of an inhibitor of that transport with high 
affinity for the protein can be used as an assay.”" 

If a membrane-bound protein catalyzes the trans- 
port of a particular metabolite across its native mem- 
brane, it is also possible to reconstitute that protein into 
sealed vesicles of phospholipid and then assay the accu- 
mulation of that metabolite into the vesicles.’ ** The 
membrane-bound protein, detergent, and phospholipid 
are mixed together at concentrations necessary to pro- 
duce a disperse solution. The detergent is then removed 
from that solution; and if the reconstitution has been 
successfully performed, sealed vesicles of phospholipid 
will form in the bilayers of which the protein of interest is 
inserted in a functional state. The ability of that protein 
to transport its specific substrate across the membrane 
into the vesicles is then assayed. Rapid methods for per- 
forming the separate reconstitutions of multiple samples 
can be used to assay the fractions from chromatographic 
separations for the protein responsible for the particular 
transport of interest. For example, proteins responsible 
for the transport of citrate,” osalate ZP and ornithine,” 
respectively, could be purified to homogeneity by assay- 
ing them with reconstitution. There is even one instance 
in which an integral membrane-bound protein can be 
renatured and then reconstituted from a completely 
denatured state.” 

When the membranes have been dissolved in a 
solution of detergent in such a way that the activity of the 
membrane-bound protein of interest is preserved and an 
assay for that protein has been developed, the protein 
can often be purified chromatographically. For example, 
(R)-pantolactone dehydrogenase (flavin) was purified 
80-fold from membranes of Nocardia asteroides by 
ammonium sulfate precipitation and six chromato- 
graphic steps after it was dissolved in a solution of 0.5% 
Brij 35. Both chromatography by molecular exclusion 
and chromatography by ion-exchange run in solutions of 
nonionic detergent, are used to purify membrane-bound 
proteins dissolved with detergent,”*'*”” as well as chro- 
matography on hydroxyapatite”? and chromatogra- 
phy by adsorption on solid phases modified with specific 
functional groups (Table 1-2) such as phenylboronate*” 
or particular dyes.*“°*"’ If one has been able to produce 
immunoglobulins specific for a particular membrane- 
bound protein before that protein has been purified, 
those immunoglobulins can be used to purify a mem- 


Table 14-6: Secondary Structure Spanning the Bilayer in Crystallographic Molecular Models of Integral Membrane-Bound Proteins 
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protein detergent or phase used during crystallization Naa” subunits? secondary number of 
structure membrane- 
spanning spanning 
membrane segments“ 
bacterial photosynthetic reaction center” > N,N-dimethyldodecylamine N-oxide?” 1200 aßyö a helices 11 
rhodopsins 
bacteriorhodopsin?” > bicontinuous cubic phase of 1-oleoyl-rac-glycerol 250 Os a helices 7 
and water" 
halorhodopsin”” bicontinuous cubic phase of 1-oleoyl-rac-glycerol 250 Os a helices 7 
and water 
mammalian rhodopsin*! nonyl ß-D-glucoside and 1,2,3-heptanetriol®” 350 a a helices 7 
complexes for electron transport 
mammalian cytochrome-c oxidase?” n-C,H»;0(CH,CH,0),,H (Brij 35) or decyl B-pb-maltoside 1800 (aBydeCneiKAuv), œ helices 28 
cytochrome-c oxidase from Paracoccus denitrificans” undecyl ß-D-maltoside 811 aß a helices 14 
mammalian ubiquinol-cytochrome-c reductase**** dodecyl B-D-maltoside, decanoyl-N-methylglucamide, 2200 (aBydeneiKA),  ahelices 13 
diheptanoylphosphatidylcholine, or octyl ß-D-glucoside 
bacterial cytochrome o ubiquinol oxidase“? octyl B-D-glucoside 1290 aßyö a helices 25 
bacterial succinate dehydrogenase”! n-Cı2H250(CH/CH20)H (C12E9) 1100 (œßyð)z æ helices 6 
bacterial lipid A export ATP-binding/permease dodecyl œ-D-maltoside 580 On a helices 6 
protein MsbA*“” 
ion channels 
bacterial potassium channel KcsA from decyl B-D-maltoside 160 Ou a helices 84 
Streptomyces lividans***4 
acetylcholine receptor from Torpedo marmorata?’ image reconstruction from tubular helical surface lattice 2333 Kä a helices 20 
bacterial large-conductance mechanosensitive channel!" dodecyl B-D-maltoside 150 Os a helices 10° 
aquaporin**39 octyl B-D-glucoside or nonyl ß-D-glucoside 270 Ou a helices 7 
mammalian endoplasmic reticulum Ca”*-transporting n-Cj2H»;0(CH2CH20)gH (CEs) 1000 a a helices 10 
ATPase350351 
bacterial outer membrane porins n-CgH,70(CH,CH,0),4H (CgE,) and 300 03 B barrel" 16 
porin??? N,N-dimethyldodecylamine N-oxide 
maltoporin?”? decyl B-p-maltoside*™ 420 03 B barrel? 19 
outer membrane protein EI"? n-CgH,70(CH,CH,0),H 340 03 ß barrel" 16 
sucrose porin?”® octyl B-D-glucoside 480 03 B barrel’ 18f 
bacterial ferrichrome-iron receptor” ">> N,N-dimethyldodecylamine N-oxide 720 a B barrel’ 231 
bacterial outer membrane protein A7 n-CgH,70(CH,CH,0),H 325 a B barrel a 
bacterial outer membrane protein Tolc*” mixture of hexyl, heptyl, octyl, and dodecyl ß-D-glucosides 470 03 B barrel 128 
bacterial a-hemolysin®™ octyl B-D-glucoside 290 Oy B barrel 148 
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“Total number of amino acids and total number of membrane-spanning segments of the noted secondary structure in each protomer of the protein unless otherwise noted. "Composition of subunits in complete 
oligomer. ‘All $ barrels are antiparallel. “Number of æ helices in the continuous cylinder of œ helices formed by the complete oligomer. ‘One of the membrane-spanning «helices is formed from two smaller œ helices 
that butt against each other, each of which passes only halfway through the membrane. /Number of strands in the continuous £ barrel formed by each subunit. Number of strands in the continuous £ barrel formed 
by the complete oligomer. 


brane-bound protein by immunoadsorption” 


because nonionic detergents usually cannot denature 
immunoglobulins. 

After a membrane-bound protein has been purified 
to homogeneity so that only one protein remains in the 
sample, reconstitution is often used to prove that the 
purified protein is responsible for the biological activity 
of interest. Examples of some of the purified proteins 
that have been shown by reconstitution to be responsi- 
ble for a specific function are ones that catalyze respec- 
tively the passive transport of water,” > the passive 
transport of glucose,” the passive transport of meli- 
biose,” the passive transport of halide ions,?” the volt- 
age-activated passive transport of sodium Jong "77 the 
calcium-activated passive transport of potassium ions, 
the inositol 1,4,5-triphosphate-activated passive trans- 
port of calcium ions,” the ATP-driven active transport of 
sodium and potassium ions,”” and the ATP-driven active 
transport of covalent conjugates between glutathione 
and other molecules”? across the membrane, as well as 
ones that form large, nonspecific pores.”” It is also pos- 
sible to demonstrate that a particular protein is respon- 
sible for a particular type of transport by expressing 
mRNA encoding that protein in oocytes of X. laevis and 
then demonstrating that the oocytes have become able 
to display the particular type of transport.°°? 

In addition to reconstitution into sealed vesicles of 
phospholipid, purified membrane-bound proteins can 
be transferred into bicelles.“ Bicelles are small flat 
disks, each a bilayer of dimyristoylphosphatidylcholine. 
The rim of each disk is a continuous ring of detergent, 
either 3-[(3-cholamidopropyl)dimethylammonio]- 
2-hydroxy-1-propanesulfonate or dihexanoylphosphatidyl- 
choline.’ The diameter of these circular disks can 
be varied by changing the ratio of detergent to phos- 
pholipid.*® It is also possible to replace the micelle of 
detergent surrounding the membrane-spanning 
portion of a purified membrane-bound protein with a 
copolymer of acrylate, N-octylacrylamide, and N-iso- 
propylacrylamide, which is an amphipathic, polymeric 
detergent.*% 

Once an integral membrane-bound protein has 
been purified, amino acid sequences from peptides can 
be used to design probes for screening libraries of cDNA. 
Once their cDNAs are available, integral membrane- 
bound proteins are often expressed from their cDNA so 
that, even though they are produced at low levels, they 
can be modified by site-directed mutation and other 
genetic manipulations to identify amino acids critical for 
one of their functions” ”” or for other studies.” The 
cDNA for a membrane-bound glycoprotein from an 
animal must be expressed in animal cells to produce the 
properly glycosylated protein.” The expression of a par- 
ticular cDNA that has been identified indirectly with a 
particular function is often used to prove that the protein 
for which it encodes actually is responsible for that func- 
tion 272273 
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Many anchored membrane-bound proteins have 
been released from the membrane endopeptidolytically 
or have been dissolved with nonionic detergents as 
biologically active proteins and purified to homogeneity 
by the normal methods of chromatography or affinity 
adsorption. A few examples of such purified anchored 
membrane-bound proteins are cytochrome b;,2""*”° HLA 
histocompatibility antigens,” HLA-linked B-cell anti- 
gen,” dipeptidyl-peptidase IV,” membrane alanyl 
aminopeptidase,” sucrose a-glucosidase/oligo-1,6-glu- 
cosidase,“*' dolichyl-phosphate ß-D-mannosyltransfer- 
ase,’ unspecific monooxygenase,” ATP diphos- 
phatase, and the hemagglutinin of influenza 
virus. 

When anchored membrane-bound proteins are 
purified intact in the presence of nonionic detergent, 
they will recombine with bilayers of phospholipid when 
the detergent is removed” > and become anchored 
again, but when they are removed from the membrane 
by endopeptidolytic cleavage, the detached biochemi- 
cally active, globular domains have no affinity for bilay- 
ers of phospholipids. Many of the endopeptidolytically 
released detachable domains have been crystallized, 
and crystallographic molecular models have been con- 
structed from the maps of electron density.” The 
portions of anchored membrane-bound proteins that 
reside outside the membrane have also been expressed 
by themselves, after being genetically detached from 
their anchors, and crystallized, and crystallographic 
molecular models have been constructed for these 
genetically released detachable domains.” 

The crystallographic molecular models of the 
detached domains of anchored membrane-bound pro- 
teins are indistinguishable from those of normal water- 
soluble proteins. The terminal region of the polypeptide 
at which the cleavage releasing the detachable domain 
from the membrane occurred is usually disordered and 
featureless in the map of electron density. From all of 
these observations, it can be concluded that such an 
anchored membrane-bound protein is simply a water- 
soluble protein that is leashed to the bilayer of phospho- 
lipid by a flexible segment of its polypeptide attached in 
turn to the embedded anchor. An anchored membrane- 
bound protein may be attached to the membrane not 
only by a transmembrane anchor at one of its termini but 
also by adsorption to the bilayer through additional 
interactions on its surface. In such an instance the 
anchor must be removed and site-directed mutations 
within this region on its surface must be performed to 
produce a fully water-soluble protein capable of being 
crystallized.” 

The embedded anchor left behind in the bilayer of 
phospholipid when a membrane-bound protein 
anchored at one of its termini is released by endopepti- 
dolytic digestion almost always has at least one segment 
of sequence composed almost exclusively of the most 
hydrophobic amino acids. The entire stretch of polypep- 
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tide left behind in the membrane may be quite long, but 
the length of each of the hydrophobic segments is usually 
only about 20-25 aa long. The amino acid sequence of 
the hydrophobic segment from equine cytochrome b; is 
-WWTNWVIPAISAVVVALMY-,”” that from one of the 
human HLA histocompatibility antigens is -VPIVGI 
VAGLVLLVAVVTGAVVAAVMW-,”® that from sucrose 
a-glucosidase/oligo-1,6-glucosidase from O. cuniculus is 
-LIVLFVIVFIIAIALIAVLA-,”” that from the hemagglu- 
tinin of an influenza virus is -WILWISFAISCFLLCVVL 
GFIMWAS-,”” and that from human glycophorin A is 
-ITLIIFGVMAGVIGTILLISYGI-.”” Such hydrophobic 
segments from anchored membrane-bound proteins are 
the most hydrophobic sequences of their length found in 
any protein.’ These sequences are usually flanked at 
both ends by regions containing normal or above- 
normal frequencies of polar and charged hydrophilic 
amino acids. 

It is these hydrophobic segments of anchored 
membrane-bound proteins thatspan the hydrocarbon of 
the bilayer of phospholipid. That each of these single, 
isolated membrane-spanning segments of amino acid 
sequence is completely surrounded by liquid hydro- 
carbon explains their extreme hydrophobicity. The short 
hydrophilic segments that end up in most of these pro- 
teins on the opposite side of the membrane from the 
large globular, detachable domains probably act simply 
as barbs that cannot be pulled through the bilayer of 
phospholipid, but other roles, such as the relay of infor- 
mation across the membrane, have been proposed for 
some ofthem. 

The hydrophobic segment of polypeptide that 
spans the membrane and serves to attach an anchored 
membrane-bound protein to the bilayer of phospholipid 
will spontaneously assume an o helix over most or all of 
its length, as judged by circular dichroic spectra, when it 
is incorporated into micelles of detergent*’'’” or bilayers 
of phospholipid." It is believed that these hydrophobic 
anchors when they are attached to the native protein are 
also uninterrupted o helices spanning the hydrocarbon 
of the biological membranes in which they are normally 
found. An o helix is the logical structure for a segment of 
polypeptide to assume when it is immersed in liquid 
hydrocarbon in the total absence of water or any other 
donor or acceptor of hydrogen bonds. Within itself, an 
a helix satisfies all of the hydrogen-bond donors on the 
polypeptide (Figure 4-16A). The width of the hydro- 
carbon in a bilayer of naturally occurring amphipathic 
lipids and cholesterol is about 3.6 nm (Figure 14-4D).*”® 
As the rise for each amino acid in an o@ helix is 0.15 nm, it 
should require about 24 aa to span the hydrocarbon. The 
fact that the lengths of the hydrophobic segments of 
anchored membrane-bound proteins are usually greater 
than 20 aa is further support for the proposal that these 
hydrophobic segments are a-helical in their normal situ- 
ation. 

The distribution of amino acids in the hydropho- 


bic segments of the naturally occurring anchors that 
span the membrane in one isolated o helix elucidate the 
hydropathic imperatives of a bilayer of phospholipid.” 
As one might expect, isoleucines, leucines, valines, ala- 
nines, and phenylalanines represent 74% of the amino 
acids in these segments, a percentage 2.5 times greater 
than their percentage in water-soluble proteins. 
Cysteine, glycine, and methionine have frequencies 
equal to those for these amino acids in water-soluble 
proteins. Serines and threonines, although they occur 
half as frequently as they do in water-soluble proteins, 
are present within these fully engulfed membrane-span- 
ning segments, but the donors on their hydroxyls can be 
automatically satisfied by the intramolecular hydrogen 
bonds in which these two amino acids often participate 
with the empty lone pairs on the acyl oxygens of the 
amino acids three or four positions ahead of them in an 
o helix (Figure 6-7). 

Tryptophans and tyrosines are present in these 
membrane-spanning segments at about two-thirds the 
frequency with which they are present in water-soluble 
proteins, but they occur exclusively at the ends of the 
segments where their lone hydrogen-bond donors can 
remain in contact with water as they do almost always in 
a crystallographic molecular model of a soluble protein. 
If, however, a membrane-spanning segment is 
hydrophobic enough, it can drag a tryptophan into the 
middle of a bilayer of phospholipid.“ Prolines are 
also confined to the ends of such membrane-spanning 
segments, usually at the amino-terminal end, so that the 
æ helix can cross the bilayer unbroken. The remaining 
polar amino acids, histidine, lysine, glutamine, aspartate, 
asparagine, glutamine, and arginine, constitute less than 
1% of the amino acids in these segments and are always 
found at the ends, so that the hydrophilic portions of 
their side chains can remain in contact with the water or 
the polar headgroups of the phospholipids. 

A number of peptides incorporating such 
hydrophobic sequences have been synthesized. For 
example, the peptide acetyl-KK(LA),}.KK-a-amide forms 
a stable o helix that spans the bilayers of phospholipid in 
vesicles of dipalmitoylphosphatidylcholine.*” The cen- 
tral portion of such a synthetic hydrophobic peptide is a 
rigid æ helix from which the a-amido protons cannot 
exchange with protons in the water on the two sides of 
the bilayer, but the œ-amido protons in the portions of 
the peptide exposed on the two sides of the bilayer 
exchange rapidly.’” There is a symbiotic relationship 
between the length of the hydrophobic sequence and the 
width of the bilayer. The peptide forms a fully miscible 
solution with the bilayer only when the length of the 
hydrophobic ahelix matches the width of the 
bilayer,°°° and the hydrophobic o helix can, to a cer- 
tain extent, adjust the width of the bilayer to match its 
length.’ Cholesterol exerts an influence on this symbio- 
sis to the extent that it increases the width of the 
bilayer.’” 


If a single lysine, aspartate, asparagine, glutamate, 
glutamine, or histidine is positioned during the synthesis 
in the center of an a-helical, polyleucyl, membrane- 
spanning peptide, the leucines will drag that side chain 
into the center of the bilayer”'”*! because there is more 
than enough standard free energy in the hydrophobic 
effect to do so. The side chain of the lone lysine, the lone 
histidine, the lone aspartate, or the lone glutamate, how- 
ever, enters the hydrocarbon as the neutral unproto- 
nated or protonated form, respectively, so that the debit 
of standard free energy is only for its neutralization 
(Equation 5-66). That each enters as the neutral form, 
which is the only form of its acid-base that contains both 
a donor and an acceptor for hydrogen bonding, is sup- 
ported by the fact that the peptides containing a single 
histidine, aspartic acid, or glutamic acid, as well as those 
containing a single asparagine or glutamine, readily form 
dimers and higher oligomers when they are incorporated 
into micelles or bilayers.” These oligomers result 
from hydrogen bonding within the hydrocarbon 
between the polar side chains. 

These results clearly demonstrate that hydrogen 
bonds, which will not form in water because of competi- 
tion from donors and acceptors on the molecules of 
water, are as stable within a phase of hydrocarbon, 
removed from contact with water, as they are in organic 
solvents (Table 5-3). When a hydrogen-bond donor and 
acceptor enter a hydrogen bond during the folding of a 
polypeptide in aqueous solution, the standard enthalpy 
change for the reaction is zero because the reaction pro- 
ceeds with no net change in the number of hydrogen 
bonds (Equation 5-40) and little change in their net 
intrinsic stability (Equation 5-48). A hydrogen-bond 
donor or acceptor in the middle of an otherwise 
hydrophobic segment spanning a membrane, however, 
is held within the hydrocarbon by the hydrophobic 
æ helix that was formed by the segment when it entered 
the membrane. The price of withdrawing the hydrogen- 
bond donors and acceptors from the water and stripping 
them of their hydration has already been paid by the 
hydrophobic effect that immersed the membrane-span- 
ning segment in the first place. When a hydrogen-bond 
donor and acceptor form a hydrogen bond between these 
oa helices within the hydrocarbon, the standard enthalpy 
change for the reaction is -12 to -20 kJ mol’ (Table 5-2). 

When a single tryptophan is positioned in a 
sequence of leucines longer than is necessary to span a 
membrane, that tryptophan shifts the membrane-span- 
ning polyleucyl o helix across the membrane until that 
tryptophan ends up close enough to the surface of the 
bilayer of phospholipid to thrust the hydrogen-bond 
donor in its side chain out of the hydrocarbon,” but a 
tyrosine does not display the same compulsion. 

Many integral membrane-bound proteins have also 
been purified,” 4316-320 some of these have been crys- 
tallized, and those crystals have provided crystallo- 
graphic molecular models (Table 14-6). 
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It is generally assumed that, as for its purification, 
the most effective strategy for crystallizing an integral 
membrane-bound protein is to begin with the protein 
dissolved in a solution of a nonionic or zwitterionic 
detergent that is structurally homogeneous, such as one 
of the alkyl glycosides, one of the alkyl oligo(ethylene 
oxide) ethers, or N,N-dimethyldodecylamine N-oxide 
(Table 14-6). There are examples, however, of integral 
membrane-bound proteins crystallizing from solutions 
of structurally heterogeneous mixtures of detergents, 
such as the crystallization of mammalian cytochrome-c 
oxidase from a random mixture of dodecyl poly(ethylene 
oxide) ethers (Brij 35; average number of ethylene oxides 
equals 23)*** or the outer membrane protein F from 
E. coli from a random mixture of octyl poly(ethylene 
oxide) ethers.”” Sometimes a mixture of several struc- 
turally homogeneous nonionic detergents is intention- 
ally prepared, as for the crystallization of the bacterial 
outer membrane protein TolC.** Mammalian ubiqui- 
nol-cytochrome-c reductase will crystallize from a 
solution of methyl 6-O-(N-heptylcarbamoyl)-a-p-glu- 
copyranoside, octyl ß-D-glycoside, octanoyl-N-methyl- 
glucamide, octanoylsucrose, or octyl ß-D-maltoside”*° 
but also from a solution of dodecyl ß-n-maltoside, 
decanoyl-N-methylglucamide, or diheptanoylphospha- 
tidylcholine (Table 14-6). All of these observations seem 
to suggest that any one of several detergents could be 
used to obtain readily crystals of a particular integral 
membrane protein, but the efforts of many investigators 
over many years that have been expended to produce 
crystals of only a few integral membrane-bound proteins 
belie this suggestion. 

Molecules of a particular protein must be dissolved 
in a continuous isotropic phase so that they can diffuse 
over the full extent of that phase to associate with each 
other and form a macroscopic crystal. Molecules of an 
integral membrane-bound protein dissolved in a solu- 
tion of nonionic detergent are in an isotropic solution 
and each can encounter all of the others. A bicontinuous 
cubic phase of lipid and water is an isotropic phase that 
forms from particular aqueous suspensions of lipid.“ In 
such a phase, a three-dimensional network formed from 
a single, continuous bilayer of amphipathic lipid 
encloses a single continuous three-dimensional network 
of aqueous channels. Such a bicontinuous cubic phase 
forms spontaneously under the proper conditions from a 
mixture of water and 1-oleoylglycerol.”” If bacteri- 
orhodopsin from Halobacterium salinarium is incorpo- 
rated into the lipid of such a bicontinuous cubic phase, 
the molecules of protein can diffuse over the full extent 
of the lipid phase to find each other and crystallize.*”” 

When integral membrane-bound proteins are in 
vesicles of phospholipid that are suspended in an aque- 
ous solution, each molecule of protein can associate pro- 
ductively only with other molecules of protein within 
that vesicle. Under certain circumstances, however, it is 
possible to produce stacked planar bilayers by fusion of 
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vesicles containing a particular integral membrane- 
bound protein, and the molecules of protein may be able 
to crystallize within each bilayer, and these two-dimen- 
sional arrays may then be able to stack regularly upon 
each other to produce a macroscopic crystal. 
Macroscopic crystals of bacteriorhodopsin suitable for 
crystallography have also been prepared in this way.’ 

The membrane-bound proteins from bacteria that 
have been crystallized successfully are often purified 
from bacteria that have been genetically altered to over- 
express that one particular protein.”*>%13,353,357,359,365 
For example, succinate dehydrogenase from E. coli was 
produced by the overexpression of a gene that had been 
introduced on a plasmid; and in the resulting bacteria, 
50% of the protein in the plasma membranes was succi- 
nate dehydrogenase.”® The protein responsible for glyc- 
erol transport in E.coli was overexpressed with a 
sequence of six consecutive histidines on its carboxy ter- 
minus so that it could be purified by affinity adsorp- 
tion.” In order to discover a large-conductance 
mechanosensitive channel that would crystallize, the 
proteins from nine different prokaryotes were each over- 
expressed and purified.” To obtain a bacterial outer 
membrane containing only one porin, the gene for outer 
membrane protein F was expressed from a plasmid in a 
strain of E coli lacking all of its porins.°”° 

So far, however, integral membrane-bound pro- 
teins from plants and animals can seldom be 
expressed in a functional form at high levels. For exam- 
ple, when isoform 4A4 of unspecific monooxygenase 
from lung of Oryctolagus cuniculus was expressed in 
E. coli, less than 0.1 mg of the monooxygenase that had 
been incorporated into the bacterial plasma membrane 
could be purified from each liter of culture. When 
human glucose transporter was expressed in E. coli, the 
most sensitive methods of immunoblotting were used to 
detect its presence in membranes isolated from the 
cells.” When bovine opsin was expressed in the mam- 
malian cell line HEK293, only about 2 mg of the protein 
that had been incorporated into the plasma membrane 
of the cells could be isolated from each liter of culture. 
And when porcine Na*/K'-exchanging ATPase was 
expressed in the yeast Pichia pastoris, only about 1 mg of 
the protein was present in the unfractionated plasma 
membranes obtained from each liter of culture.°® It 
seems that bacteria are unable to insert animal proteins 
into their plasma membranes efficiently and that eukary- 
otic expression systems such as yeast or animal cells have 
trouble inserting extra protein into their membranes, 
perhaps because they are already crowded or perhaps 
because the systems for inserting them have only limited 
capacity. Consequently, most of the integral membrane- 
bound proteins from animals that have been crystallized 
have been purified from naturally occurring membranes 
that normally contain high concentrations of those pro- 
teins and can be obtained in large quantities from whole 
tissues. 


From an examination of the crystallographic 
molecular models of integral membrane-bound pro- 
teins, there seem to be only two successful strategies that 
have been discovered by evolution through natural 
selection for immersing a protein in a membrane so 
completely that a major portion of its structure spans the 
bilayer of phospholipid, exposing respective portions of 
its surface to each side. Either the portion of the protein 
within the hydrocarbon of the bilayer is a bundle of a he- 
lices or it is a B barrel. The reason that these two arrange- 
ments are exclusive is that they seem to be the only two 
ways to provide acceptors for most if not all of the amido 
nitrogen-hydrogens in the backbone of those segments 
of polypeptide spanning the hydrocarbon of the bilayer. 
This conclusion is reinforced by the fact that bacterial 
outer membrane protein A spans the membrane with an 
eight-stranded £ barrel that is almost a perfect cylinder 
so that every amido nitrogen-hydrogen participates in a 
hydrogen bond. Eight-stranded £ barrels in soluble pro- 
teins are usually flattened so that the hydrogen bonds of 
the backbone in the two flattened regions can be 
straighter, but the a-amido nitrogen-hydrogens in the 
creases at the two edges of the flattened cylinder are able 
to form hydrogen bonds with water. Such an arrange- 
ment would be impossible within a bilayer of phospho- 
lipid, so the cylinder cannot be flattened. 

At the moment, it appears that the integral mem- 
brane-bound proteins spanning the membrane with 
bundles of ahelices are confined exclusively to the 
plasma membranes and intracellular membranes of 
cells, be they eukaryotic or bacterial, and the integral 
membrane-bound proteins spanning the membrane 
with ß barrels are confined almost exclusively to the bac- 
terial outer membrane, which is a bilayer formed from 
an outer monolayer of lipopolysaccharide and an inner 
monolayer of phospholipid. One of the few exceptions to 
this rule is bacterial -hemolysin (Table 14-6), which is a 
toxin excreted from Staphylococcus aureus that forms a 
Bbarrel of 14 strands, two from each of its seven sub- 
units, within the plasma membrane of a foreign cell to 
punch a hole in it and kill that cell. 

Bacteriorhodopsin from H. salinarium (Figure 
14-14), Ca?'-transporting ATPase from endoplasmic 
reticulum of O. cuniculus (Figure 14-15), the mem- 
brane-spanning domain of potassium channel KcsA 
from Streptomyces lividans (Figure 14-16), photo- 
synthetic reaction center from R. viridis (Figure 14-17) ot 
and ubiquinol-cytochrome-c reductase from mitochon- 
dria of S. cerevisiae (Figure 14-18)*”!*” are paradigms of 
the a-helical class of integral membrane-bound pro- 
teins; and porin OmpF from E. coli (Figure 14-19)°° and 
ferrichrome-iron receptor from E coli (Figure 14-20)” 
are paradigms of the ß-barrel class of bacterial integral 
membrane-bound proteins. These crystallographic 
molecular models illustrate the variety of the structures 
assumed by integral membrane-bound proteins. 

The extant crystallographic molecular models of 
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integral membrane-bound proteins demonstrate nicely 
the range of structures that are possible (Table 14-6). 
They vary in size from the monomer of mammalian 
rhodopsin (350 aa) to the dimer of heteroundecamers 
of mammalian ubiquinol-cytochrome-c reductase 
(4400 aa). In some of these proteins, such as the bacteri- 
orhodopsin (Figure 14-14), the bacterial porins (Figure 
14-19), or the aquaporins, most of the protein is within 
the membrane; in others, such as bacterial succinate 
dehydrogenase, mitochondrial ubiquinol-cytochrome-c 
reductase (Figure 14-18), or bacterial a-hemolysin, less 
than 15% of the protein is within the membrane. Some of 
the proteins are monomers (Figures 14-15 and 14-20), 
some are rotationally symmetric homooligomers 
(Figures 14-16 and 14-19), some are heterooligomers 
(Figures 14-17 and 14-18), and some are homooligomers 
of heterooligomers. The heterooligomeric protomer of 
mammalian cytochrome-c oxidase, with 13 different 
unrelated subunits with lengths ranging from 45 to 
510 aa, is one of the most complex heterooligomeric pro- 
teins in existence, if matrices of polymeric proteins are 
not counted. 

In some of the heterooligomers, such as, ironically, 
cytochrome-c oxidase, every one of the subunits has at 
least one membrane-spanning segment; in others, such 
as ubiquinol-cytochrome-c reductase (Figure 14-18) and 
bacterial succinate dehydrogenase, only about half the 
subunits have membrane-spanning segments. In these 
latter proteins, the subunits without membrane-span- 
ning segments are globular structures associated by typ- 
ical heterologous protein-protein interfaces with the 
subunits that do have them. When dissociated from such 
a complex, these subunits with no contact to the bilayer 
can sometimes be crystallized as water-soluble pro- 
teins.” Two of the four nonidentical subunits of het- 
erooligomeric photosynthetic reaction center from 
R. viridis (Figure 14-17) are homologous in sequence and 
have superposable arrangements of their five mem- 
brane-spanning o helices arranged around a 2-fold rota- 
tional axis of pseudosymmetry; the third subunit has a 
single membrane-spanning anchor; and the fourth sub- 
unit has no membrane-associated segments of polypep- 
tide but does have a 1,2-diacyl-3-deoxyglyceryl group 
that is attached by a thioether linkage to its amino-ter- 
minal cysteine’ and that is embedded in the mem- 
brane. 

The eukaryotic complexes for electron transport 
often contain one or more short subunits that each con- 
tain one membrane-spanning segment and that appear 
to perform only a structural role in the complex. For 
example, ubiquinol-cytochrome-c reductase from 
S. cerevisiae (Figure 14-18) has two such membrane- —————__+ 
spanning subunits of 94 and 65 aa, while bovine cyto- 
chrome-c oxidase (Table 14-6) has six of 84, 73, 56, 56, 47, 
and 46 aa, respectively. Beyond the 30 aa needed to span 
the membrane, the smaller of these subunits have little 
else left. 


Figure 14-14: Skeletal drawing of the crystallographic 
molecular model of an individual subunit of the 


o;homotrimer of bacteriorhodopsin from H. salinar- 
ium 5” The protein used for the crystallography was 


overexpressed in H. salinarium. Membranes were dis- 
solved in octyl B-glucoside, and the protein was purified 


chromatographically. The purified protein in a solution 
of octyl B-glucoside was mixed with 1-monooleoyl- 
rac-glycerol (monoolein) and a concentrated aqueous 
solution of sodium phosphate at pH 5.6 to incorporate 
the protein into a bicontinuous cubic phase of 
monoolein and water in which it crystallized.”” The 
complete crystallographic molecular model is drawn 
and side chains in thin line segments. The numbering is 
that of the mature posttranslationally modified protein. 
The crystallographic molecular model is oriented so 
that the plane of the membrane is horizontal and the 
cytoplasmic surface of the protein is at the bottom. A 
of the membrane is unknown because the protein was 


crystallized from monoolein. This drawing was pro- 


sent the hydrocarbon of the bilayer. The exact location 
duced with MolScript.’” 


with the polypeptide backbone in thick line segments 
bar 3.6 nm high is placed next to the protein to repre- 


3.6nm 
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The membrane-spanning portion of the integral 
membrane-bound proteins represented by the crystallo- 
graphic molecular models can be formed entirely by seg- 
ments of secondary structure from one polypeptide, 
such as those in Ca**-transporting ATPase (Figure 14-15) 
mammalian rhodopsin, or bacterial outer membrane 
protein A; from segments of secondary structure from 
several different subunits, both heterologous and homol- 
ogous, as in photosynthetic reaction center (Figure 
14-17); from segments of secondary structure from many 
different heterologous subunits, as in mammalian 
cytochrome-c oxidase; or from identical segments of sec- 
ondary structure from the identical subunits in a 
homooligomer, as in the membrane-spanning domain of 
potassium channel KcsA (Figure 14-16). In the 
homooligomers responsible for transport of specific 
molecules or inorganic ions across the membrane, each 
subunit can form within the membrane its own channel 
for the substrate, as in the bacterial porins and the aqua- 
porins; or each subunit in the oligomer can contribute an 
identical set of secondary structures that are arrayed 
around an n-fold rotational axis of symmetry normal to 
the membrane to form together the sole channel in the 
complete oligomer, as in the membrane-spanning 
domain of potassium channel KcsA (Figure 14-16), the 
bacterial outer membrane protein TolC, and bacterial 
a-hemolysin. 

Many of the integral membrane-bound proteins are 
responsible for transporting metabolites or inorganic 
ions, either actively or passively, or for transporting infor- 


2 SE 5 mation across the membrane. Those responsible for pas- 
d 2 = E sively transporting metabolites nonspecifically, such as 
= os a the porin OmpF (Figure 14-19), have a fairly wide 
5 ao 5 hydrophilic water-filled channel passing through their 
z 3 8 = center; in the porin OmpF the channel at its narrowest is 
D o S £, only 0.7 nm x 1.1 nm. Those responsible for passively 
3 DE ž transporting particular molecules or ions usually have an 
‘3 oe 3 obvious channel passing through them, within which 
o, there is a region in which the selection for those mole- 
> cules is performed. For example, potassium channel 
Di 


KcsA (Figure 14-16) has a water-filled, cylindrically sym- 
metric channel passing through it at one end of which is 
a constriction in which there is a symmetrically displayed 
set of acyl oxygens from the polypeptide backbone that 
select the potassium ons 272) In proteins that actively 
transport cations, such as Ca”'-transporting ATPase 
(Figure 14-15), cytochrome-c oxidase,” and 
cytochrome o ubiquinol oxidase,“ the passageway 
through which those cations pass is much narrower, less 
obvious, and convoluted so that the entrance into the 
center of the channel and the exit from the center of the 
channel by those ions can be controlled and coupled to 
conformational transitions in the protein. Those proteins 
responsible only for the passage of information such as 
mammalian rhodopsin”! present no passageway for any 
metabolites, water, or cations by filling the membrane 
with solid protein; otherwise they would produce leaks. 
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tration of one of these sheaths is the layer of side chains 
protruding from the continuous barrel of 22 strands 
lar sheath, there is a typical globular structure of ß sheets 
and æ helices,” insulated from the hydrocarbon of the 
bilayer of the outer membrane by the sheath. The side 
chains of leucine (23), valine (14), tryptophan (13), 
phenylalanine (11), glycine (9), alanine (9), isoleucine (8), 
lysine (2), arginine (2), aspartate (2), glutamate (1), 
asparagine (1), serine (1), and histidine (1) that are 

Although the sheath surrounding the protein is not 


located at the ends of the sheath in contact with the head 


74% of this sheath. Polar side chains of glutamine (4), 
groups of the phospholipid make up only 10%. 


receptor from E coli (Figure 14-20). Within this particu- 
tyrosine (7), methionine (4), and proline (2) constitute 


between it and the membrane. The most dramatic illus- 
completely enclosing the interior of ferrichrome-iron 


that encloses the protein and forms the interface 


such as mammalian endo- 
transporting ATPase (Figure 


#1 have globular domains on the cytoplasmic 


surface of the membrane that convert the binding and 
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The side chains of the amino acids forming the con- 
tinuous surface of the protein in direct contact with the 


Many integral membrane-bound proteins have 
bilayer of phospholipid?” 


large globular domains on one side of the membrane or 
sense the presence of molecules outside the cell, such as 
acetylcholine receptor,” have globular domains on the 
responsible for this function. Proteins that catalyze the 
hydrolysis of cytoplasmic MgATP into the movement of 
those cations against their gradients of concentration. 
Most of these globular domains resemble globular solu- 
ble proteins in the details of their structure, so it is only 
the structural details of the portions of these molecular 
models that are immersed within the bilayer of phos- 
pholipid and located in its immediate vicinity that are 
peculiar to these proteins. 


the other. For example, proteins that act as receptors that 
extracytoplasmic surface of the membrane that are 


active transport of cations, 
plasmic reticulum Ca 
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so clearly visible in the regions of an integral membrane- 
bound protein that spans a membrane with a bundle of 
æ helices, it can be delineated by scoring the exposure of 
the side chains in these o helices to the bilayer of phos- 
pholipid. For example, the sheath surrounding the mem- 
brane-spanning « helices in the photosynthetic reaction 
center (Figure 14-17) from Rhodobacter sphaeroides is 
formed from side chains that have greater than 20% of 
their respective surface areas exposed to the bilayer of 
phospholipid.” Of those side chains that have greater 
than 50% of their respective surface areas exposed to the 
bilayer, 91% are leucines (15), isoleucines (12), pheny- 
lalanines (12), valines (7), alanines (6), glycines (5), 
tyrosines (5), tryptophans (4), and methionines (3); of 
those with 20-50% exposed, 71% are from this same set 
of side chains. In keeping with the a-helical secondary 
structure, only one proline, that at the end of one of the 
ahelices, is exposed to the bilayer. The two glutamines 
and the one lysine that have more than 50% of their sur- 
face area exposed are at the edges of the membrane, as 
are the three glutamates and the single asparagine, glu- 
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tamine, and arginine that have 20-50% of their respective 
surface areas exposed. The side chains of these amino 
acids are in contact with the polar head groups of the 
phospholipids. 

These observations illustrate the fact that the por- 
tion of each of the integral membrane-bound proteins 
that is immersed in the bilayer is surrounded by a sheath 
that is significantly enriched in hydrophobic amino 
acids. This sheath forms a boundary between protein 
and lipid that is compatible with the hydrocarbon in the 
middle of the membrane and the head groups at the two 
surfaces and dissolves the protein in the bilayer of phos- 
pholipid and cholesterol just as the polar surfaces of 
globular, water-soluble proteins dissolve them in water. 
The distribution of hydropathy over the surface of this 
sheath determines the depth at which the protein floats 
within the bilayer of phospholipid and its orientation rel- 
ative to the plane of the membrane.*” 

In integral membrane-bound proteins that are situ- 
ated in plasma membranes and intracellular membranes 
and that consequently span these membranes with a 
bundle of o helices, the few lysines, arginines, aspartates, 
asparagines, glutamates, glutamines, and tyrosines that 
they contain are usually located at the ends of these 
membrane-spanning o helices, and the polar or charged 
nitrogens and oxygens in their side chains reach out of 
the hydrocarbon of the bilayer of phospholipid into the 
polar interfaces on each side, as if their side chains were 
snorkels.”’° 7’ Lysines, arginines, aspartates, and gluta- 
mines are located almost twice as often at the amino-ter- 
minal end as at the carboxy-terminal end of a 
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In those integral membrane-bound proteins that 


have such pores still have locations occupied by mole- 
cules of water within the £ barrel?” or within the bundle 
uble proteins, these locations occupied by molecules of 
water can be clustered, or they can be entirely sur- 
rounded by donors and acceptors from the protein. 
There are also sites occupied by inorganic cations?” or 
span the membrane with a bundle of o helices that con- 
tain no aqueous channels, the density with which their 
atoms are packed is about 5% greater than that of a 


anions” in the membrane-spanning regions. 
Although the «helices of bacteriorhodopsin (Figure 


those well within the sheath, are a helices. Proteins that 
form channels for metabolites and inorganic ions often 
have significant aqueous pores passing most of the way 
through them (Figure 14-16), but proteins that do not 
of a helices?” that spans the membrane. As in water-sol- 
water-soluble protein formed from a bundle of 
ahelices,* but the packing of the œ helices is similar. 


+66° and 7; 


=-65°) at the p carbon that point their side chains in the 


amino-terminal direction are about twice as populated 


integral membrane-bound protein 


resembles the interior of a globular, water-soluble pro- 


are about twice as likely to be located 
tein except for the fact that all membrane-spanning seg- 


-177°) that points them towards the car- 


boxy terminus.*”***' Lysines and arginines both in single, 


spanning a-helical anchors and membrane- 


spanning bundles of «helices in integral membrane- 


bound proteins??? 


membrane-spanning a helix. An amino-terminal loca- 
tion assists in the emergence of their side chains from the 
at the cytoplasmic ends of the o helices.**’ This tendency 
is due in part to the negative surface charge on the cyto- 
plasmic side of naturally occurring membranes.” 
Within the sheath of hydrophobic side chains, the 
ments of integral membrane-bound proteins from 
plasma membranes and intracellular membranes, even 


hydrocarbon because the two rotamers (7; 


as the one (7; 
structure of an 


membrane 
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14-14) are all almost parallel to each and normal to 
the plane of the membrane,” in most of the proteins 
of this class, the ahelices are tilted with respect to 
each other (Figures 14-15, 14-16, 14-17, and 14-18). 
For example, in the photosynthetic reaction center from 


R. sphaeroides,*™ the average angle of tilt is +22°. About 
50% of the angles Q between two adjacent membrane- 
spanning chelices (Figure 6-23) in crystallographic 
molecular models of integral membrane-bound proteins 
have values between +10° and +30°,**°°*’ but these 
angles can be either positive or negative, even within the 
same protein. 

In a $ barrel, the ß strands are automatically held in 
rigid orientation by the regular array of interstrand 
hydrogen bonds, but when a bundle of o helices assem- 
bles in the membrane they are held together by an irreg- 
ular array of adventitious interhelical hydrogen bonds. 
For example, in bacteriorhodopsin, which has a bundle 
of seven «ahelices spanning the membrane (Figure 
14-14), there are 31 hydrogen bonds that interconnect 12 
pairs of these o helices?” and that are in part responsible 
for their relative orientations and the tertiary structure 
they assume. Most of these hydrogen bonds are between 
donors and acceptors found at the ends of the a helices, 
locations in which polar and charged side chains are fre- 
quently found, but 10 of them are in the middle of the 
membrane within the regions of the hydrocarbon of the 
bilayer of phospholipid (Table 14-7). All of the ionizable 
side chains should be in the neutral form of their 
acid-base (Lysine 216 is the imine of a retinol) so that the 
carboxy groups of the aspartic and glutamic acids have 
both a donor and three acceptors. Many of these inter- 
helical hydrogen bonds are from a donor or acceptor on 
the protein to a molecule of water and from there to a 
donor or acceptor on the protein. These bridging mole- 
cules of water are found at locations within the bilayer. 

As has been noted previously, each of these hydro- 
gen bonds has a significantly negative enthalpy of for- 
mation that is released during the assembly of these 


Table 14-7: Hydrogen Bonds in the Interior of the 
Membrane between Donors and Acceptors on Different 
Membrane-Spanning a@Helices in Bacteriorhodopsin 
from H. salinarium*® 


a helices joined? acceptor” donor 
C-D L870 D1150D1 
CD D1150D1 T900G1 
D-E M1180 S1410G 
F-G A2150° W182NE° 
F-G D2120D1 Y1850H 
B-F D2120D1 Y570H 
B-G D2050° Y570H° 
CG K2160° T460° 
C-G D850D2° D2120D1° 
C-G K216NZ*“ D850D2° 


jo Helix A spans the membrane with amino acids 11 through 29; œ helix B, 44 
through 62; œ helix C, 79 through 96; œ helix D, 108 through 127; a helix E, 135 
through 154; œ helix F, 173 through 191; and o helix G, 204 through 223. Only 
hydrogen bonds within the hydrocarbon of the bilayer are tabulated. “Hydrogen- 
bonded through a molecule of water. “Lysine 216 is posttranslationally modified 
with a retinal. 


ahelices within the membrane to produce the tertiary 
structure of the protein. Once each membrane-spanning 
a helix has become inserted in the bilayer of phospho- 
lipid, however, the hydrophobic effect is no longer in 
operation. Consequently, the importance of hydrogen 
bonding and the importance of the hydrophobic effect in 
the formation of tertiary structure are reversed once that 
portion of the polypeptide is within the hydrocarbon of 
the bilayer of phospholipid. During the assembly of the 
ahelices, hydrogen-bond donors and acceptors can be 
responsible for significant, favorable standard enthalpy 
of formation, yet no favorable change in standard free 
energy, other than that associated with the packing effi- 
ciency, occurs when two hydrophobic surfaces are juxta- 
posed. The situation, however, is similar to that observed 
in the folding of a water-soluble protein in that the 
hydrophobic effect in the one case immerses the @ helix 
in the bilayer and in the other produces the initial 
hydrophobic collapse of the polypeptide. After these 
hydrophobically driven events, the establishment of the 
final tertiary structure relies not at all or much less, 
respectively, on the hydrophobic effect because it has 
already been expended. 

Prolines are present in many of the «helices in 
membrane-spanning bundles.*® As it does in an o helix 
in a water-soluble protein, a proline always occupies a 
kink in a membrane-spanning a helix. There are also 
kinks in membrane-spanning o helices that do not incor- 
porate a proline,*”° but it has been noted that wherever 
there is such a kink in a membrane-spanning o helix 
there will be a high frequency of homologous proteins 
that have prolines at that position.” In addition, it has 
been demonstrated that the introduction of a proline at 
an unkinked position in a membrane-spanning o helix 
usually produces a protein unable to assume its native 
structure while changing a proline at a kink to an ala- 
nine has little effect on the protein.” Evidently, a kink 
that has been established by evolution through natural 
selection in a membrane-spanning œ helix prefers to 
have a proline at the position of the kink, but not exclu- 
sively, while established, unkinked segments of o helix 
are too rigidly held within the structure to tolerate the 
inevitable kink that results from inserting a proline.”® A 
proline just beyond one of the ends of a membrane- 
spanning o helix can promote the doubling back of the 
polypeptide to form the next membrane-spanning 
at helix.” 

The question of whether or not there are cystines 
within the membrane-spanning segments of integral 
membrane-bound proteins is of chemical interest. First, 
the interior of a bilayer of phospholipid is devoid of glu- 
tathione or any other small mercaptan. Second, oxygen 
is more soluble in the hydrocarbon of a bilayer than it is 
in water. Third, the oxidation of two thiols to a disulfide 
performed by oxygen is a free radical reaction that 
should proceed normally within the hydrocarbon. 
Fourth, cystine is one of the most hydrophobic of the side 
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chains. In spite of these facts, there do not appear to be 
any cystines in the membrane-spanning segments of 
integral membrane-bound proteins. One of the first indi- 
cations of this peculiar absence was the observation that 
acetylcholine receptor from Torpedo californica, 
although it has a total of 13 cysteines in its 20 membrane- 
spanning o helices, has no cystines in those membrane- 
spanning ahelices.”' Although there are a number of 
cystines in the extracytoplasmic portions of the crys- 
tallographic molecular models listed in Table 


hy gee S Gg 


vd 


CG 


357 Membranes 


from a strain of E. coli that overexpresses 
ferrichrome-iron receptor were dissolved in 


of ferrichrome- 
2% Triton X-100, and the ferrichrome-iron 


receptor was purified chromatographically 


from this solution. In the final chromato- 
graphic step, which was by molecular exclu- 
Triton X-100 to 1% n-octyl B-p-glucoside. It 
was crystallized from this solution by the use 
protein is a monomer, so the ß sheet must be 
high enough to span the outer membrane all 
the way around. Only the hydrogen bonds in 
the portions of the Bsheet that span the 
membrane are included. The extracellular 


surface of the protein is on the top; the sur- 
the drawing. This drawing was produced 


sion, the protein was transferred from 2% 
of poly(ethylene glycol) as a precipitant. The 
face facing the periplasm is on the bottom of 
with MolScript.’” 


Figure 14-20: Skeletal drawing of the 
polypeptide backbone of the crystallo- 


graphic molecular model 
iron receptor from E. coli. 
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14-6,%1333,335,337,35L391 there is none in any of the mem- 


brane-spanning portions of these proteins, with the 
exception of a cystine in the center of the large water- 
filled pore passing through maltoporin.”” This latter cys- 
tine, however, is not in contact with the hydrocarbon of 
the outer membrane. 

The photosynthetic reaction center from R. viridis 
(Figure 14-17) has 11 membrane-spanning a helices. 
Each traverses the hydrocarbon of the bilayer of phos- 
pholipid in one unbroken œ helix. The amino acid 
sequences of these 11 œ helices’? are -SLGVLSLFSGL 
MWFFTIGIWFWYNA-, -LKEGGLWLIASFFMFVAVWSW 
WGRTYLRAQA-, -AWAFLSAIWLWMVLGFIRPILM-, -PF 
HGLSIAFLYGSALLFAMHGATILAV-, -MEGIHRWAIWM 
AVLVTLTGGIGILL-, -GFFGVATFFAALGIILIAWSAVL-, 
-GGLWQIITICATGAFVSWALREVEICRKL-, -HIPFAFAIL 
AYLTLVLFRPVM-, -PAHMIAISFFFTNALALALHGALVLS 
AA-, -GTLGIHRLGLLLSLSAVFFSALCMII-, and -IAQLV 
WYAQWLVIWTVVLLYLRREDR-.°”°° In each of these 
amino acid sequences there is a region of at least 20 
amino acids in length that contains no amino acids that 
are charged at neutral pH with the exception of the 
arginines at the carboxy-terminal ends of the third and 
the eighth qahelices. The reason that these 11 
ahelices were able to be inserted into the bilayer of 
phospholipid is that they are composed of hydrophobic 
amino acids and the bilayer is an organic solvent into 
which the side chains of these amino acids have dis- 
solved. 

These 11 hydrophobic segments of amino acid 
sequence are similar to the five listed earlier for the single 
membrane-spanning ahelices from anchored mem- 
brane-bound proteins but differ in flavor. The hydrogen- 
bond donors, tryptophan and tyrosine, are more 
uniformly distributed over the length of these segments; 
the neutral but hydrophilic hydrogen-bond donors and 
acceptors glutamine, asparagine, and histidine now 
occasionally appear; and the frequency with which 
glycine is encountered is greater. Each of these subtle 
changes indicates that these amino acid sequences are 
from a bundle of o helices gathered together as a protein 
rather than from individual œ helices spanning the 
extremely hydrophobic environment of the membrane 
as isolated entities. Even in a helices completely buried 
in the center of one of these bundles, however, such as 
the three a helices -PRMNNMSFWLLPPSFLLLLASSM-, 
-ASVDLTIFSLHLAGVSSILGAINFITTN-, and -LFVWSV 
MITAVLLLLSLPVLAAGITMLLTD- from bovine mito- 
chondrial cytochrome-c oxidase,” the preponderance 
of hydrophobic amino acids at the center of each per- 
sists. 

Even though most integral membrane-bound pro- 
teins are dissolved in solutions of a nonionic detergent 
before they are crystallized (Table 14-6), they carry mol- 
ecules of phospholipid from the original membranes 
along with them into the respective crystals, and if the 
maps of electron density are clear enough, these mole- 


cules can be recognized and included in the crystallo- 
graphic molecular model. For example, in a crystallo- 
graphic molecular model of photosynthetic reaction 
center from R. sphaeroides, there is a molecule of diphos- 
phatidylglycerol;* in one for bovine mitochondrial 
cytochrome-c oxidase, there are five molecules of 
phosphatidylethanolamine and three molecules 
of phosphatidylglycerol;** and in one for ubiquinol- 
cytochrome-c reductase from S. cerevisiae, there is a 
molecule of phosphatidylcholine, two of phos- 
phatidylethanolamine, one of phosphatidylinositol, and 
one of diphosphatidylglycerol.“ At least two diether 
lipids (see 14-8) are represented by their 1-(2,6,10,14- 
tetramethylhexadecan-16-yl)-2-(2,10,14-trimethylhexa- 
decan-16-yl)glyceryl groups in the crystallographic 
molecular model of bacteriorhodopsin. 

The fatty acyl or isoprenyl chains of all of these mol- 
ecules of phospholipid or diether lipid usually lie within 
the straight crevices running between the membrane- 
spanning œ« helices in an orientation roughly normal to 
the plane of the membrane, but in one case the fatty acyl 
groups penetrate sideways into spaces between the a he- 
lices,” and the head group of this particular phospho- 
lipid is 0.8 nm away from the plane in which head groups 
are normally located (Figure 14-2). The head groups of 
all of the other phospholipids in these crystallographic 
molecular models are in the expected positions, and they 
engage in many hydrogen bonds with side chains of the 
protein and molecules of water occupying fixed locations 
bridging those head groups and the protein. 

It is possible that these rigidly fixed molecules 
of phospholipid are structurally essential just as many 
of the molecules of water in the interior of a molecule of 
protein are structurally essential. If so, these observa- 
tions may explain why integral membrane-bound pro- 
teins often lose their biological activity and even their 
tertiary structure when the ratio of detergent to protein 
becomes too large. They may also be examples of mole- 
cules of phospholipid within the boundary layer that are 
most severely immobilized, just as fixed locations for 
molecules of water on the surface of a crystallographic 
molecular model are the most severely immobilized mol- 
ecules of the waters of hydration. 

The sheath of hydrophobic side chains surrounding 
an integral membrane-bound protein is immersed in the 
bilayer of phospholipid and surrounded by the hydro- 
carbon of the amphipathic and neutral lipids. The lipids 
in this boundary layer are those molecules of lipid the 
behavior of which is affected at a given instant by the 
presence of the protein. Molecules of lipid in the bound- 
ary layer can be formally distinguished from those mol- 
ecules of lipid that behave as if they were in an 
unadulterated bilayer of the same amphipathic lipids. 

This distinction resembles in its ambiguity the dis- 
tinction between water of hydration associated with a 
protein (Table 6-4) and water in the bulk solution. In the 
case of water of hydration, there is a gradual diminution 


of the influence of the protein the farther a particular 
molecule of water is from its surface, but a water mole- 
cule several shells from the protein may still be influ- 
enced by it because of the nets of hydrogen bonds that 
ensnare both water and protein. Likewise, a molecule of 
amphipathic lipid somewhat distant from the protein 
may be marginally influenced by it when one or two of its 
methylenes strike against the surface of the protein as the 
linear hydrocarbon writhes within the liquid paraffin, 
but molecules of amphipathic lipid embracing the pro- 
tein should be more severely affected. Networks of 
hydrogen bonds among the hydrophilic head groups of 
the phospholipids and sphingomyelins (Figure 14-6) 
may also spread the influence of the protein beyond its 
immediate vicinity. In this context, lipid in the boundary 
layer” surrounding the protein and under its influence 
has been defined operationally just as water of hydration 
has been defined operationally. Just as in the case of 
waters of hydration, a single numerical value for moles of 
lipid in the boundary layer (mole of protein)! is meas- 
ured. 

When 1-palmitoyl-2-stearoylphosphatidylcholine, 
to which a dimethyl cyclic nitroxyl radical is attached at 
the 14th carbon of the stearoyl group, 
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is incorporated into bilayers of phosphatidylcholine con- 
taining an integral membrane-bound protein such as 
Ca’*-transporting ATPase, the electron spin resonance 
spectrum that is observed (Figure 14-21A)” can be 
decomposed into two spectra (Figure 14-21B,C) of which 
it is the sum.’ One of the component spectra is the 
same as that of nitroxylphosphatidylcholine 14-16 in 
pure vesicles of liquid phosphatidylcholine (Figure 
14-21F), and one is that of nitroxylphosphatidylcholine 
14-16 when its motion is restricted (Figure 14-21E). It 
was concluded*” that there are two sets of nitroxylphos- 
phatidylcholines 14-16 present in these bilayers, one set 
constituted by molecules of restricted mobility located 
in the boundary layer immediately adjacent to the pro- 
tein and the other constituted by molecules of unre- 
stricted mobility in the bulk bilayer of phospholipid. The 
rates at which lipids exchange at positions within the 
boundary layer*”’ are around 10’ s so no one molecule 
of phospholipid remains for a significant amount of time 
within it. 

As the ratio between phosphatidylcholine from 
eggs of G. gallus and Ca?'-transporting ATPase in these 
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Figure 14-21: Decomposition of the electron spin resonance 
spectrum of nitroxylphosphatidylcholine 14-16 in vesicles of phos- 
pholipid in which Ca’*-transporting ATPase from O. cuniculus has 
been incorporated.*” Ca’'-Transporting ATPase, from which all 
indigenous phospholipid had been removed, was incorporated 
into vesicles of phosphatidylcholine from eggs of G. gallus into 
which nitroxylphosphatidylcholine 14-16 was also incorporated. 
In the final suspension of vesicles containing the protein, the molar 
ratios of Ca’*-transporting ATPase (110,500 g mol’) to phos- 
phatidylcholine to phosphatidylcholine nitroxyl radical were 
1:55:0.4. The nitroxyl radical was acting as a probe present in dilute 
concentration within a solvent of phosphatidylcholine. The 
observed electron spin resonance spectrum (A) of this probe in this 
environment could be decomposed into two component spectra. 
One component (C), which accounted for 54% of the spins pro- 
ducing spectrum A, had the same spectrum as the probe dissolved 
in the same preparation of phosphatidylcholine in the absence of 
protein (F). The other component (B), which accounted for 46% of 
the spins producing spectrum A, was assumed to represent bound- 
ary lipid. It had the same spectrum as that of the probe dissolved in 
a viscous bilayer composed of dipalmitoylphosphatidylcholine and 
palmitoyloleoylphosphatidylcholine at a ratio of 4:1 (E). The effect 
of the immobilization of the probe by the viscous bilayer of phos- 
pholipid resembles the effect of the protein on the probe (compare 
spectra B and E). A summation of reference spectra E and F, at a 
molar ratio of 0.46 to 0.54, produced theoretical spectrum D, which 
reproduces the observed spectrum A. All spectra are the amplitude 
of the first derivative of the adsorption of the microwave energy as 
a function of the strength of the magnetic field (tesla). Reprinted 
with permission from ref 397. Copyright 1984 American Chemical 
Society. 


membranes is increased, the fraction of restricted nitrox- 
ylphosphatidylcholine 14-16 decreases. This decrease 
results from a competition between unlabeled molecules 
of phosphatidylcholine and molecules of nitroxylphos- 
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phatidylcholine 14-16 for positions in the boundary 
layer adjacent to the protein and the increase in the con- 
centration of the former. From the numerical values of 
the fraction of restricted nitroxylphosphatidylcholine 
14-16 as a function of the molar ratio between phos- 
phatidylcholine and protein, the ratio between the affini- 
ties of nitroxylphosphatidylcholine 14-16 and unmod- 
ified phosphatidylcholine for positions adjacent to 
protein can be estimated, and the number of molecules 
of phosphatidylcholine occupying positions adjacent to 
the protein can be calculated.“ 

In the case of Ca”'-transporting ATPase at 
25 °C,”®® phosphatidylcholine and  nitroxylphos- 
phatidylcholine 14-16 have equal affinity for positions 
around the protein and the number of positions is 22 mol 
(mol of protein)”. In other words, there are, on the aver- 
age, 22 molecules of phospholipid in the boundary layer 
around a molecule of Ca**-transporting ATPase, each of 
which can exchange with no bias for a molecule of 
nitroxylphosphatidylcholine 14-16, and these 22 mole- 
cules of natural, unlabeled phosphatidylcholine occupy 
locations at which a molecule of nitroxylphosphatidyl- 
choline 14-16 would be restricted in its motion by the 
protein. Similar measurements with spin-labeled choles- 
terol demonstrated that cholesterol was able to occupy 
all of the positions around Ca’*-transporting ATPase 
within the boundary layer but with an affinity about two- 
thirds that of natural phosphatidylcholine.”” 

The number of lipids in the boundary layer has 
been estimated to be 94 + 10 for each homodimer of 
cytochrome-c oxidase,°” 40 + 7 for each heteropentamer 
of acetylcholine receptor,“ 21 + 3 for each monomer of 
bovine rhodopsin,“ and 10 for each folded polypeptide 
of myelin proteolipid protein.” The numbers of lipids in 
the boundary layer is roughly equal to the number of 
molecules of lipid needed to cover the surface of the 
sheath for that protein with one layer of linear alkane 
aligned normal to the plane of the membrane.“ 

It is not necessarily the case that when a position in 
the boundary layer is occupied by a natural phospholipid 
rather than nitroxylphosphatidylcholine 14-16, the 
motion of that natural phospholipid is noticeably 
affected. The side chains of the amino acids presented by 
the sheath surrounding the membrane-spanning bundle 
of ahelices to the linear hydrocarbon of the phospho- 
lipid are themselves branched hydrocarbons that are free 
to rotate fluidly, and the hydrocarbon of phosphatidyl- 
choline may experience little change as it enters or leaves 
these positions. When Ca°*-transporting ATPase was 
incorporated into vesicles of dioleoylphosphatidyl- 
choline in which either the 2nd carbons or the 9th and 
10th carbons on the two fatty acids were labeled with 
deuterium rather than a nitroxyl radical, the effect of the 
protein on the motion of these lipids could be followed 
by deuterium nuclear magnetic resonance spec- 
troscopy.’” The protein was present at a ratio of about 
1 mol (100 mol of phospholipid)”. Statistically signifi- 


cant increases in the anisotropy at the 9th and 10th car- 
bons were detected when the protein was added to the 
bilayer but not at the 2nd carbon. The increase in 
anisotropy observed, even with the assumption that only 
20-30% of the lipid was in the boundary layer, was only 
about 10-20% for the lipids in these positions. This small 
increase indicates that the protein does not force the 
hydrocarbon in this region to assume conformations 
much more irregular than those it would normally 
assume. The deuterium spin-lattice relaxation times, 
which are measures of the fluidity of the hydrocarbon, 
were barely altered by the presence of the protein, and 
no evidence for two separate sets of phospholipids 
remaining distinct over time intervals greater than 5 ms 
was observed. 

Just as with soluble proteins, many integral mem- 
brane-bound proteins (Table 14-6) are homooligomers 
(Figures 14-16 and 14-19) or heterooligomers (Figures 
14-17 and 14-18). Now that synthetically pure detergents 
are available, it is possible to dissolve a membrane, and 
if an oligomeric protein of interest is stable enough, the 
dissolution can produce a monodisperse solution of that 
membrane-bound oligomer. Each complex between a 
micelle of the detergent and an oligomer of that protein 
in such a solution has the same shape, as judged by its 
frictional coefficient, and the same molar mass, as 
judged by sedimentation equilibrium. “^” Furthermore, 
it seems to be the case that the oligomer found in solu- 
tion upon dissolving the membrane is often, but not 
always," the same one originally present in the mem- 


While quantitative cross-linking gives reliable 
determinations of the number of subunits in water-solu- 
ble oligomeric proteins and for integral membrane- 
bound proteins in solutions of detergent,””**”’ it fails to 
do so for membrane-bound oligomers that are still 
within the membrane. The problem is that the mem- 
brane-bound proteins within a native membrane are 
always at too high a concentration, and it is technically 
difficult to dilute them sufficiently to prevent them from 
cross-linking intermolecularly. Even when the mem- 
branes contain only one protein, quantitative cross-link- 
ing usually produces covalent polymers so large that 
their complexes with dodecyl sulfate do not even enter 
an acrylamide gel upon electrophoresis.” If particular 
care is taken to reconstitute a membrane-bound protein 
at high ratios of phospholipid to protein so that each 
reconstituted vesicle is large and each contains only a 
few molecules of the protein, intermolecular cross-link- 
ing can be suppressed sufficiently that intramolecular 
cross-linking can give an accurate assessment of the 
oligomeric state of the protein. The ideal preparation 
for assessment of the number of subunits in the oligomer 
of an integral membrane-bound protein by quantitative 
cross-linking would be a suspension of vesicles in which 
each vesicle contained only one copy or no copies of the 
oligomer. 


Oligomeric proteins constructed from identical 
subunits always incorporate rotational axes of symmetry 
into their structures. In a bilayer of amphipathic lipids, 
all of the identical subunits of either an oligomeric, 
anchored membrane-bound protein or an oligomeric, 
integral membrane-bound protein are inserted so that 
they point in the same direction. Because the same 
hydrophobic segments of their common amino acid 
sequence span the bilayer, all folded polypeptides of the 
same sequence float at the same depth in the membrane 
and have the same orientation. These inescapable 
requirements placed upon the common structure of the 
subunits of oligomeric membrane-bound proteins force 
any rotational axis of symmetry relating the individual 
subunits in the protein to be normal to the plane of the 
bilayer. Therefore, a membrane-bound homooligomeric 
protein can have only one rotational axis of symmetry, 
and that axis will be normal to the plane of the mem- 
brane. Consequently, all membrane-bound oligomeric 
proteins must have cyclic symmetry, and dihedral sym- 
metry“ is prohibited. The same argument can be made 
for membrane-bound heterooligomers formed from 
homologous, superposable subunits. These het- 
erooligomers will contain rotational axes of pseudosym- 
metry normal to the plane of the membrane. 

Closed structures with rotational axes of symmetry 
are even more exclusive necessities for oligomeric mem- 
brane-bound proteins than they are for soluble proteins. 
A screw axis of symmetry is incompatible with the vecto- 
rial two-dimensional distribution of identical or homol- 
ogous subunits enforced by the bilayer, and helical 
polymeric fibers are not available structures. The only 
interfaces that could propagate a linear polymer of indef- 
inite length in a membrane would have to form from 
complementary faces arrayed at precisely 180° across 
from each other on each subunit, or the row of subunits 
produced by them would eventually come around upon 
itself to form an unbroken or a broken ring. During evo- 
lution by natural selection, therefore, every time two 
complementary faces appear at random on the surface of 
one monomer of a membrane-bound protein such that 
those two faces can produce a series of interfaces joining 
several of the subunits in an oligomer, either an incom- 
plete ring or a complete ring of an integral number of 
subunits will always form. A complete ring, because no 
pair of complementary faces remains unassociated, is a 
more stable structure than an incomplete ring. If the 
angle between the two complementary faces on a single 
subunit is an integral quotient of 360°, where the integer 
is greater than or equal to 3, complete rings containing 
that number of subunits will form. Because only one 
rotational axis of symmetry normal to the bilayer is avail- 
able, magic numbers such as four or six, applicable to 
soluble oligomeric proteins having sets of perpendicular 
rotational axes of symmetry, are irrelevant to membrane- 
bound oligomeric proteins. In addition, unlike a soluble 
protein such as hemoglobin, a membrane-bound pro- 
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tein cannot be tetrameric under one set of conditions 
and dimeric under another 141 

It is not surprising that the oligomeric membrane- 
bound proteins the structures of which have been 
directly observed are all assembled around single rota- 
tional axes of symmetry normal to the plane of the 
bilayer. In this case, the distinction between anchored 
and integral membrane-bound proteins is irrelevant 
because both are constrained by the same requirements. 
Thus both bacteriorhodopsin,*”” an integral membrane- 
bound protein, and the hemagglutinin of influenza 
virus, an anchored membrane-bound protein, have 
three identical subunits arrayed around a 3-fold rota- 
tional axis of symmetry normal to the plane of the mem- 
brane. Photosynthetic reaction center, an integral 
membrane-bound protein containing two different 
polypeptides with homologous amino acid 
sequences,’ has those two folded polypeptides 
arrayed around a 2-fold rotational axis of pseudosymme- 
try normal to the plane of the membrane (Figure 
14-17).*!3 Exo-a-sialidase of influenza virus, an anchored 
membrane-bound protein, has four identical subunits 
arrayed around a 4-fold rotational axis of symmetry.“ 

Acetylcholine receptors from T. californica and 
T. marmorata are integral membrane-bound proteins. 
Each is an œßyô heteropentamer*'”*'° constructed from 
four unique polypeptides,"’”*"® designated a, ß, y, and ô 
on the basis of their electrophoretic mobilities. All four of 
these polypeptides are glycoproteins,” and all four 
are homologous in sequence. TD All four can be read- 
ily aligned, and in the six pairwise comparisons the per- 
cent identity averages around 40%.’ These four distinct 
subunits were derived from a common ancestor and all 
four of them assume the same unique superposable ter- 
tiary structure upon folding.“ The five subunits, &ßy6, 
in the native structure of acetylcholine receptor are 
arrayed around a 5-fold rotational axis of pseudosym- 
metry normal to the plane of the membrane (Figure 
14-22) 124425 

Gap junction connexon, an integral membrane- 
bound protein, has six identical subunits arrayed around 
a 6-fold rotational axis of symmetry.””° In each gap junc- 
tion between two cells, each connexon, which is a ring of 
six subunits in the plasma membrane of one of the cells, 
is associated by a 2-fold rotational axis of symmetry with 
another gap junction connexon in the membrane of the 
other cell to produce a dodecamer that is a dimer of 
hexamers with dihedral symmetry of point group 622 
(Ds) with one hexamer in each plasma membrane. 
Within the plasma membrane of a given cell, the con- 
nexons are in crystalline arrays of indefinite surface 
area.’ A different hexamer of identical subunits arrayed 
around a 6-fold rotational axis of symmetry normal to the 
plane of the bilayer“ is the major constituent of the 
luminal plasma membrane of urinary bladder, and this 
protein is also present in the cell in a crystalline array.’ 

There is a group of proteins the crystallographic 
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Figure 14-22: Map of electron scattering density of acetylcholine receptor embedded in a glass of amorphous ice.*“**”° Membranes enriched 
in acetylcholine receptor were prepared from electric organs of Torpedo marmorata by differential centrifugation. The membranes were 
resuspended in 0.1 M tris(hydroxymethyl)aminomethane hydrochloride, pH 6.8, and allowed to stand at 10 °C for 1 month, at which time 
long (<1 um) cylindrical tubes 70 nm in diameter had formed. These tubes were helical, crystalline arrays of molecules of acetylcholine recep- 
tor within tubular bilayers of the amphipathic lipid from the membranes. The asymmetric unit in the helical array was a dimer of identical 
acetylcholine receptors. The asymmetric units formed the rows of a 15-stranded left-handed helical array in one dimension and the rows of 
a 5-stranded right-handed helical array in the other dimension of the (-15,5) surface lattice. These tubes were embedded in a thin layer of 
amorphous ice on a film of carbon on an electron microscopic grid. Digitized electron micrographs of these tubes were submitted to Fourier 
transformation. The layer lines of the resulting diffraction pattern were indexed, and variations in phase and amplitude along the layer lines 
were measured from these diffraction patterns. These functions were then submitted to Fourier-Bessel inversion to obtain a three-dimen- 
sional map of electron scattering density for the tube. (A) View perpendicular to the surface of the tube of this map of scattering density. The 
image was made by stacking about 20 successive sheets of clear plastic of the appropriate thickness, each with a cross section of the map 
traced upon it. The successive sections chosen were 0.5 nm apart. Reprinted with permission from Nature, ref 424. Copyright 1985 Macmillan 
Magazines Limited. (B) Cross section through the center of a molecule of acetylcholine receptor in a plane normal to the axis of the tube. The 
blocklike structure at the bottom of the image is thought to be a protein other than acetylcholine receptor. The bilayer of amphipathic lipid 
is to the right and left. There is a deep cylindrical depression on the upper, extracytoplasmic surface of the protein and a small shallow depres- 
sion on the lower, cytoplasmic surface of the protein. The five subunits arrayed about a 5-fold rotational axis of pseudosymmetry produce a 
thick cylindrical pipe about 7 nm in diameter and 5 nm in height with a wall 2.5 nm thick extending out from the extracytoplasmic surface 
of the membrane. Reprinted with permission from ref 425. Copyright 1990 Rockefeller University Press. 


molecular models of which at first glance seem to con- with no gaps) with the amino acid sequence from Glycine 
tradict the axiom that all rotational axes of symmetry in 167 to Threonine 204 in the second half of the molecule. 
integral membrane-bound homooligomers must be This alignment brings into register the amino acids form- 
normal to the plane of the membrane. Aquaporin serves ing the second and third membrane-incorporated 
as a paradigm for the proteins in this group. Aquaporin is ahelices (Figure 14-23) from the first half of the folded 
an o,homotetramer, which as expected has cyclic sym- polypeptide with those forming the membrane-incorpo- 
metry of point group 4 (C,). There is no doubt, however, rated sixth and seventh o helices, respectively, from the 
that the subunit of the aquaporin tetramer (Figure second half of the folded polypeptide. There are eight 
14-23)" is the product of an internal duplication. This membrane-incorporated o helices in all, so these pair- 
conclusion follows from the fact that the amino acid ings are the ones expected from an internal duplication. 
sequence from Valine 51 to Cysteine 88 in the first half of Outside this central region of the protein, the two amino 


bovine aquaporin can be readily aligned (30% identity acid sequences cannot be aligned with statistical signifi- 


cance, but when the crystallographic molecular model of 
the subunit is viewed from the proper angle (Figure 
14-23), it can be seen that there is an obvious 2-fold rota- 
tional axis of pseudosymmetry that superposes the first 
half of the folded polypeptide onto the second half. This 
superposition is made all the more compelling by the fact 
that the peculiar structure of the third membrane-incor- 
porated o helix, which passes only halfway across the 
membrane before the polypeptide doubles back, is 
exactly mimicked by that of the seventh membrane- 
incorporated o helix. 

The puzzling aspect of this 2-fold rotational axis of 
pseudosymmetry within the subunit of aquaporin is that 
it is parallel to the plane of the membrane. Most soluble 
proteins or subunits of soluble proteins that are the 
products of duplications of the genes encoding their 
ancestors have 2-fold rotational axes of pseudosymmetry 
relating the duplicated halves (Figure 9-18). It has usu- 
ally been assumed that these are the vestiges of 2-fold 
rotational axes of symmetry that related the two halves 
when they were identical subunits in a homodimer 
before the carboxy terminus of one subunit was joined to 
the amino terminus of the other subunit by the duplica- 
tion of the ancestral gene. If this were the case in aqua- 
porin, then 2-fold rotational axis of symmetry relating 
the two identical subunits of the homodimer that was the 
ancestral protein would have had to be parallel to the 
plane of the membrane. 

Because this symmetry is not possible, it follows 
that there was no ancestral homodimer of aquaporin and 
that the gene that was duplicated was one encoding a 
monomeric integral membrane-bound protein. Because 
the amino terminus of that ancestral monomer was on 
the cytoplasmic side of the membrane and its carboxy 
terminus was on the extracytoplasmic side, once the 
fourth membrane-incorporated o helix of the new inter- 
nally duplicated protein had been inserted into the 
membrane, the machinery responsible for inserting the 
ancestral monomer into the membrane came upon a 
segment of polypeptide that formed the first membrane- 
spanning o helix in the ancestral protein but was now on 
the wrong side of the membrane. It was simply incorpo- 
rated in the opposite direction along with the other three 
æ helices of the duplicated half. When the two halves of 
the internally duplicated protein cleaved to each other, 
as they were bound to do, the usual 2-fold rotational axis 
relating the two halves of a dimer was created. Because 
the two halves of the protein were inserted in the mem- 
brane in opposite directions, that 2-fold rotational axis of 
pseudosymmetry had to be parallel to the plane of the 
membrane. 

There are a number of other integral membrane- 
bound proteins the subunits or monomers of which have 
2-fold rotational axes of pseudosymmetry parallel to the 
plane of the membrane relating the two halves of what 
are assumed to be the products of internal duplica- 
tions.“ Because the membrane-spanning portions of 
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these proteins are all bundles of membrane-spanning 
a helices, a situation that makes any superposition more 
likely, and because most of these proteins have evolved 
to the extent that obvious alignments of amino acid 
sequence can no longer be made, the conclusion that the 
two halves result from an internal duplication usually 
relies almost entirely on the order in which the super- 
posed o helices occur in the sequence of the protein. The 
order in which the o helices paired by the superposition 
around the rotational axis of pseudosymmetry occur in 
the second half of the amino acid sequence is the same 


graphic molecular model (Table 14-6) of one 
of the subunits of isoform 1 of aquaporin 
purified from bovine erythrocytes. The 
view is down the 2-fold rotational axis of 
pseudosymmetry relating the two halves of 
the internal duplication in this protein. This 
drawing was produced with MolScript.’” 


Figure 14-23: Skeletal drawing of the 
polypeptide backbone of the crystallo- 
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as the order in which they occur in the first half of the 
amino acid sequence. 

In each of these other proteins, as is the case with 
aquaporin, the junction between the carboxy terminus of 
the first half of the protein and the amino terminus of the 
second half of the protein is located on the opposite side 
of the membrane from the amino terminus of the first 
half. In all of these other cases, this topography results 
from the fact that the two halves have an odd number of 
membrane-spanning o helices. Consequently, following 
the gene duplication, the first segment of polypeptide 
encoding a membrane-spanning o helix at the amino 
terminus of the second half of each of these proteins 
found itself on the wrong side of the membrane during 
the insertion of the complete protein into the membrane 
and was simply incorporated in the opposite direction 
from the direction in which its twin at the amino termi- 
nus of the first half of the protein had been incorporated. 
The existence of these proteins reiterates the fact that 
larger proteins are created by the fusions and duplica- 
tions of genes encoding their smaller ancestors. 

The interfaces within the membrane among the 
subunits of an oligomeric membrane-bound protein 
should be stabilized from dissociation by different non- 
covalent forces from those stabilizing interfaces within 
soluble proteins.“ Because the faces forming an inter- 
face between subunits become surrounded by hydrocar- 
bon upon dissociation rather than by water, the 
hydrophobic effect should be irrelevant. On the other 
hand, hydrogen bonding, which provides little favorable 
free energy to the formation ofan interface within a water- 
soluble protein, should exert its full effect on stabilizing 
the interface of an oligomeric membrane-bound protein 
within the bilayer because when an interface dissociates, 
the hydrogen-bond donors and acceptors within it lose 
their acceptors and donors, respectively. These consider- 
ations were validated in a study of the effect of site- 
directed mutation on the interface between two identical 
membrane-spanning o helices within the interface form- 
ing the œ dimer of glycophorin.****’ When the threonine 
within the interface was mutated to an alanine, the 
increase in its dissociation constant was much greater 
than when that threonine was mutated to a serine. 
Changes in hydrophobicity produced by other mutations, 
however, did not correlate with the respective changes in 
dissociation constant, but changes in the detailed stere- 
ochemical fit of one face to the other did correlate with 
changes in dissociation constant. These results suggest 
that both hydrogen bonding and stereochemical fit, but 
not hydrophobicity, determine the strength of an inter- 
face within the hydrocarbon of a membrane. 

For those integral membrane-bound proteins that 
have not yet been crystallized in three dimensions, 
molecular models of lower resolution are often obtained 
from image reconstruction of two-dimensional crys- 
talline arrays of these proteins still within the membrane. 
The protein most suited for such an approach so far has 


been bacteriorhodopsin from H. salinarium because it is 
already in a two-dimensional crystalline array within the 
plasma membrane of the bacterium TH The space group 
of this two-dimensional array is P3, and the asymmetric 
units related by the 3-fold rotational axes of symmetry 
within the unit cell are individual subunits of bacterio- 
rhodopsin (Figure 14-14). Although bacteriorhodopsin is 
one of the few membrane-bound proteins that is natu- 
rally crystalline, most integral membrane-bound pro- 
teins can be induced to crystallize in two dimensions.” 
Such crystals are then embedded in a glass of amor- 
phous ice*“ and are examined in a cryo-electron micro- 
scope, an electron microscope in which the embedded 
specimen is maintained at a temperature of 4 K while 
images are prepared.“ 

A two-dimensional crystalline array of an integral 
membrane-bound protein within a bilayer of lipids 
(Figure 14-24)*° is a three-dimensional distribution of 
electron scattering density, 6(x,y,z), that is periodic in the 
two dimensions of the plane of the membrane. The elec- 
trons that it scatters in an electron microscope will form 
an electron diffraction pattern (Figure 14-25).*" This dif- 
fraction pattern does not arise from reflections generated 
by sets of parallel planes running through a three-dimen- 
sional lattice (Figure 4-8) but from reflections generated 
by sets of parallel lines running through the two-dimen- 
sional lattice (Figure 4-4) of the projection of the three- 
dimensional array on a plane normal to the beam of 
electrons. Each reflection has an amplitude, an index, 
and a phase, but as always, the phases cannot be meas- 
ured directly. 

The indexed set of amplitudes and phases of the 
electron diffraction pattern from such an array are the 
amplitudes and phases of the Fourier transform of the 
projection of the three-dimensional distribution of elec- 
tron scattering density of the array upon the plane 
normal to the axis of the beam (Equation 9-5). Therefore, 
they are the amplitudes and phases of a central section 
through the three-dimensional Fourier transform of the 
three-dimensional distribution of electron scattering 
density (Equation 9-4). 

An electron micrograph of a two-dimensional crys- 
talline array of a membrane-bound protein (Figure 
14-24) is a projection of the three-dimensional electron 
scattering density 6(x,y,z) of the array upon a plane 
normal to the axis of the beam of electrons (Equation 
9-6). From the digitized distribution of contrast on the 
electron micrograph, the amplitudes and phases of the 
central section of the Fourier transform of the three- 
dimensional electron scattering density can be calcu- 
lated by a computer (Equation 9-5). Because the array is 
of a two-dimensional crystal, the central section through 
its Fourier transform is a lattice of spots. Each spot in the 
transform calculated by the computer from the digitized 
micrograph corresponds to one of the reflections in the 
electron diffraction pattern (Figure 14-25) and has the 
same phase and the same relative amplitude as it does. 


Figure 14-24: Electron micrograph of a two-dimensional crys- 
talline array of bovine cytochrome-c oxidase.“ Mitochondria from 
bovine heart were sonicated, and the resulting fragments of mem- 
brane were separated by differential centrifugation. Fragments rich 
in cytochrome-c oxidase were extracted with Triton X-114 and then 
Triton X-100 to dissolve away other proteins. The purified particu- 
late material was composed of fragments of membrane in which 
the only protein was cytochrome-c oxidase. These fragments of 
membrane were attached to carbon films and embedded in a glass 
of the negative stain uranyl acetate. Upon examination in the elec- 
tron microscope, it was found that in many of the fragments the 
cytochrome-c oxidase had crystallized into two-dimensional 
arrays. The upper electron micrograph is of one of these arrays 
viewed normal to the electron beam. The lower electron micro- 
graph is the same array tilted on a horizontal axis so that its plane 
was at an angle of 36° to the beam of electrons. Reprinted with per- 
mission from ref 439. Copyright 1977 Academic Press. 


This permits the phases to be estimated from the micro- 
graph and the amplitudes to be measured from the elec- 
tron diffraction pattern. 

The Fourier transform of the periodic three-dimen- 
sional distribution of electron density that is a three- 
dimensional crystal of a protein is a three-dimensional 
lattice of peaks in reciprocal space. Each peak has an 
amplitude and a phase. Each reflection in the diffraction 
of X-radiation from the crystal represents one of these 
peaks. The Fourier transform of the three-dimensional 
distribution of scattering density in a crystalline array of 
a membrane-bound protein, which is periodic in only 
two dimensions, is a lattice of parallel lines in reciprocal 
space. Each of these parallel lattice lines has an ampli- 
tude and a phase that vary periodically along its length. If 
the variations of the amplitudes and phases along each of 
these lattice lines could be measured, the three-dimen- 
sional distribution of electron scattering density in the 
unit cell of the crystalline array could be calculated by 
Fourier transformation of this set of functions. 
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Figure 14-25: Electron diffraction pattern produced by a two- 
dimensional crystalline array of bacteriorhodopsin from H. sali- 
narium.‘” Fragments of membrane containing crystalline arrays of 
bacteriorhodopsin were attached to a film of carbon and embed- 
ded in a glass of glucose. The specimen was centered in the beam 
of electrons of an electron microscope and a photographic plate 
was used to record the reflections of the diffraction pattern. The 
reflections emerge from the specimen at characteristic angles 
determined by the lattice, and they are recorded on a piece of film 
at a known distance from the specimen. The dark central peak is 
the majority of the electrons that passed through the specimen 
undeflected. The sharp dots of varying intensity are the reflections 
themselves. Reprinted with permission from ref 412. Copyright 
1975 Academic Press. 


If a crystalline array is tilted in the electron beam 
(Figure 14-24, lower panel), a projection along an axis 
tilted relative to the axis normal to the plane of the array 
is recorded on the micrograph, and the amplitudes and 
phases of the electron diffraction pattern, which are now 
the amplitudes and phases of the Fourier transform of 
this new projection, have changed. Each of the micro- 
graphs and electron diffraction patterns in a series in 
which the specimen is systematically tilted represents a 
different central section through the lattice of lines in the 
Fourier transform of the three-dimensional array of scat- 
tering density that is periodic in two dimensions.” If 
enough of these central sections are gathered, the ampli- 
tudes and phases of the Fourier transform within certain 
ranges along the lattice lines can be gathered (Figure 
14-26).**’ The amplitudes can be gathered from either 
electron diffraction patterns or Fourier transforms of the 
digitized contrast on electron micrographs, but the 
phases can be obtained only from Fourier transforms of 
the digitized distributions of contrast on the electron 
micrographs. In such reconstructions, the details of the 
scattering density fade away at the two ends of the mole- 
cule above and below the bilayer of phospholipid 
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Figure 14-26: Variation of intensity and phase along two of the lattice lines [(3,0) and (4,2)] in the Fourier transform of the three-dimen- 
sional distribution of electron scattering density within the two-dimensional crystal of bacteriorhodopsin from H. salinarium.*”’ The inten- 
sities are the intensities of the reflections on electron diffraction patterns of the array (Figure 14-25) tilted at various angles. The phases were 
determined from Fourier transforms of the distribution of contrast on electron micrographs of the specimens tilted at the same angles. The 
lattice lines that occur in the Fourier transform of the electron micrograph are the same as the lattice lines upon which the reflections of the 
electron diffraction lie because the electron diffraction pattern is the same Fourier transform of the same array. Each data point on each graph 
represents a measurement from a different electron diffraction pattern or a different electron micrograph, respectively, from a specimen at 
a different angle of tilt. As the angle of tilt is varied, different positions along the lattice lines are sampled. The phase (degrees) or intensity 
(in arbitrary units) is presented as a function of the distance along the lattice line (nanometer). Reprinted with permission from Nature, 
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because the specimen can be tilted only between -60° 
and +60°, and the amplitudes and phases on the lattice 
lines arising from the unseen portions of the protein are 
outside the regions that can be sampled with these tilts. 
When the amplitudes of the electron diffraction pat- 
terns of crystalline arrays of bacteriorhodopsin embed- 
ded in a glass of glucose and tilted at various angles were 
combined with the phases from the digitized distribu- 
tions of contrast in electron micrographs taken at the 
same angles of tilt, a map of the three-dimensional elec- 
tron scattering density within the asymmetric unit in the 
unit cell of the two-dimensional crystal of the protein 
could be calculated.”° ++ At low resolution, when only 
reflections from lattice lines with Bragg spacing out to 
0.7 nm were included in the calculation,” seven rods of 
scattering density aligned roughly perpendicular to the 
plane of the array within the membrane were 
observed.””*”* The seven rods are the seven o helices that 
span the membrane in bacteriorhodopsin (Figure 14-14). 


When reflections from lattice lines with Bragg spac- 
ing out to 0.35 nm were included in the calculation of the 
map of electron scattering density,’ the same seven 
rods of electron scattering density were observed, but the 
rods became more detailed at their ends so that some of 
the connections between the rods could be observed 
and, more importantly, protrusions of electron scatter- 
ing density appeared along the rods. These protrusions 
represent the side chains of the amino acids in the 
sequence of the protein. From the few connections 
observed and the pattern of the largest protrusions along 
each rod (representing the aromatic amino acids), a 
molecular model of the polypeptide, built with the 
known amino acid sequence, could be unambiguously 
placed into the map of electron scattering density. The 
map of electron density could be further improved by 
eliminating the contribution of diffuse electron scatter- 
ing to the amplitudes of the electron diffraction, and the 
crystallographic molecular model was submitted to 


refinement against the observed amplitudes of the elec- 
tron diffraction.“ The final, refined molecular model 
duplicated the X-ray crystallographic molecular model of 
the protein that was subsequently reported?” in the 
arrangement of the o helices and their relative positions 
and orientations but lacked many of the atomic details of 
the latter molecular model. In particular, the details of 
the structure outside the membrane were ill-defined in 
the molecular model from electron diffraction, because 
the range of tilt that could be performed did not permit 
reflections from these regions to be gathered to suffi- 
ciently small Bragg spacing. 

It has also been possible to obtain a map of scatter- 
ing density of acetylcholine receptor from Torpedo mar- 
morata from electron scattering and image 
reconstruction (Figure 14-22). A map of scattering den- 
sity calculated from data sets with small Bragg spacing 
has sufficient detail within the bilayer to observe the pro- 
trusions of the side chains from the four membrane-span- 
ning ahelices of each of its five homologous subunits. 
From the pattern of these protrusions and the necessary 
structural homology of the five subunits, it was possible 
to insert the amino acid sequences of the various mem- 
brane-spanning segments into the map of electron scat- 
tering density.’ The structure outside the membrane 
was again, ill-defined, but serendipitously, there already 
was a crystallographic molecular model of a water-solu- 
ble protein homologous to the globular extracytoplasmic 
portions of acetylcholine receptor. This crystallographic 
molecular model could be positioned in the map of 
scattering density for the portions of acetylcholine recep- 
tor on the extracytoplasmic surface of the bilayer of phos- 
pholipids, and the amino acid sequences of acetylcholine 
receptor could be substituted into the folded polypeptide 
that had been positioned. The molecular model that 
resulted could be submitted to refinement against the 
amplitudes of the electron diffraction to obtain a crystal- 
lographic molecular model of acetylcholine receptor both 
within the membrane and on its extracytoplasmic sur- 
face.’* 

Maps of electron-scattering density have been cal- 
culated from electron diffraction and Fourier transfor- 
mation of digitized images of tilted specimens in 
cryo-electron microscopes for light-harvesting chloro- 
phyll a/b-protein complex (Bragg spacing > 0.34 nm H? 
isoform 1 of human aquaporin (Bragg spacing = 
0.38 nm),““°““” sodium/proton antiporter NhaA (Bragg 
spacing > 0.7 nm). 7 gap-junction channel (Bragg spac- 
ing > 0.75 nm), Na*/K'-exchanging ATPase (Bragg 
spacing > 0.95 nm),”” and Ca”'-transporting ATPase 
(Bragg spacing > 1.4 nm).”"’ In the first two instances, the 
maps of electron scattering density were detailed enough 
to identify the membrane-spanning «helices and posi- 
tion them in their proper orientations and relative posi- 
tions; but in the case of isoform 1 of aquaporin, an X-ray 
crystallographic molecular model (Bragg spacing = 
0.22 nm) was reported shortly afterward.’ 
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Nuclear magnetic resonance has also been used to 
obtain structural information about either peptides 
forming single membrane-spanning «helices or small 
integral membrane-bound proteins with one or two 
membrane-spanning œ helices. If the peptide or the 
small integral membrane-bound protein can be inserted 
into small, uniform micelles of detergent to form a 
monodisperse solution?” or if it can be monodispersely 
dissolved in a mixture of miscible organic solvents in 
water“? in such a way that its normal structure is 
retained, the usual two- and three-dimensional nuclear 
magnetic resonance spectra can be gathered from these 
solutions, and the chemical shifts of the nuclei of 'hydro- 
gens, carbons, and nitrogens in the peptide or protein 
can be assigned, just as though it were a peptide or pro- 
tein that normally dissolves unassisted in water. The 
nuclear Overhauser effects among the nuclei of the 
‘hydrogens can then be used to define the structures of 
these peptides or proteins. 

It is also possible to obtain two- and three-dimen- 
sional nuclear magnetic resonance spectra from 
peptides or small proteins that span a membrane in a 
single o helix when they are inserted into oriented multi- 
bilayers of phospholipid such as those described in 
Figure 14-4. The chemical shifts of the nuclei of 'hydro- 
gens and "nitrogens can be assigned, and information 
about the tilt of the a helices in the bilayers of phospho- 
lipid can be obtained from the fluctuations of the cou- 
pling constants between nuclei of the respective 
a'hydrogens and adjacent amido ‘nitrogens.*”° 

Electron spin resonance from probes attached to 
membrane-bound proteins has been used to obtain 
information about membrane-spanning œ helices in 
larger integral membrane-bound proteins formed from 
bundles of such o helices. All of the naturally occurring 
cysteines in an integral membrane-bound protein are 
mutated to other amino acids. A segment of the amino 
acid sequence that is thought to be a membrane-span- 
ning æ helix in the native structure of the resulting cys- 
teineless protein is identified from its hydropathy. Each 
of the consecutive amino acids in that segment is 
mutated in turn to a cysteine to produce a set of single 
point mutants. Each mutant is covalently modified at the 
respective cysteine with (1-oxyl-2,2,5,5-tetramethyl- 
pyrroline-3-methyl)methanethiosulfonate (see 12-11). 
The frequency with which molecular oxygen is able to 
collide with the nitroxyl radical in each of the covalently 
labeled native proteins is then estimated from the effect 
that changes in the concentration of molecular oxygen in 
the sample have on the rates of relaxation of the respec- 
tive unpaired electron. Molecular oxygen, because it is 
paramagnetic, catalyzes the relaxation of the unpaired 
electron when it encounters the nitroxyl radical. 

When the consecutively modified positions in the 
amino acid sequence of the protein are in amembrane- 
spanning o helix on the outer surface of the native struc- 
ture of the integral membrane-bound protein, the 
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accessibility of the respective nitroxyl radicals to oxygen 
varies periodically along that a helix as it directs them 
periodically into the hydrocarbon of the phospholipid in 
which the molecular oxygen is dissolved or into the 
center of the protein in which it is not (Figure 
14-27A) 497458 

When the modified positions in the amino acid 
sequence of the modified protein are in a loop between 
two membrane-spanning œ helices, the rates of relax- 
ation of the respective unpaired electrons are accelerated 
by chelated paramagnetic ions such as chromium 
oxalate (Figure 14-27B)® or nickel N,N-dicar- 
boxymethyl-1,2-diaminoethane dissolved in the aque- 
ous phase to which the loop is exposed.“ The increase 
in relaxation by oxygen is less pronounced for nitroxyl 
radicals in these locations because oxygen is less soluble 
in water than in hydrocarbon. 

A segment of amino acid sequence in the protein is 
identified as a membrane-spanning « helix in the native 
integral membrane-bound protein if the excited state of 
nitroxyl radicals attached to its amino acids is not relaxed 
by the chelated paramagnetic ions*® but does display a 
pattern of exposure to molecular oxygen that fluctuates 
with a period of 3.6 aa (Figure 14-27). In the case of bac- 
teriorhodopsin from H. salinarium, the prediction that 
the polypeptide between Threonine 128 and Tyrosine 
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131 would be a loop between two ahelixes while the 
polypeptide between Serine 132 and Threonine 142 
would be within an o helix spanning the membrane was 
validated when a crystallographic molecular model of 
the protein (Figure 14-14) became available "77 That the 
periodicity observed in the catalysis of the relaxation of 
the unpaired electron coincided with periodicity of the 
exposure of the respective side chain to the hydrocarbon 
of the bilayer of phospholipid was also validated by the 
model. 

Similar periodicities of exposure of covalently 
attached nitroxyl radicals have been observed for sets of 
consecutive mutants to cysteine in segments of amino 
acid sequence from lactose permease of E coli,*®° diph- 
theria toxin from corynephage ß,'° and citrate transport 
protein from S. cerevisiae.*” It is assumed that these seg- 
ments from these integral membrane-bound proteins 
are membrane-spanning «helices even though their 
crystallographic molecular models are not yet available. 

Fluctuations, also with a period of 3.6 aa, in levels of 
expression in oocytes of X. laevis and apparent dissocia- 
tion constant for acetylcholine upon consecutive muta- 
tion to tryptophan of amino acids in a segment of the 
amino acid sequence of acetylcholine receptor from 
T. californica were presented as evidence that this seg- 
ment spanned the membrane as an g helix.*® Likewise, 


Figure 14-27: Periodic variation in exposure of side chains to the 
hydrocarbon of the bilayer of phospholipids along an o helix that 
spans the membrane in bacteriorhodopsin from H. salinarium.”” 
The gene for the protein was expressed in E. coli. All of the cys- 
teines in the wild-type protein were replaced by site-directed 
mutation. A set of mutants of this cysteineless protein was then 
constructed and expressed. Each member of this set had one of the 
amino acids in the sequence from Glycine 125 to Threonine 142 
replaced, respectively, by a cysteine, and the complete set con- 
tained each of the consecutive single point mutants. Each mutant 
protein was overexpressed in E. coli, and the product accumulated 
in the bacteria as a denatured precipitate. The respective polypep- 
tides were purified”” and modified with (1-oxyl-2,2,5,5-tetra- 
methylpyrroline-3-methyl)methanethiosulfonate at their single 
cysteines while in their unfolded states. The covalently modified, 
unfolded polypeptides were reconstituted to their native state by 
treatment with detergent, phospholipid, and retinal,”® and the 
refolded states of each protein were shown to be fully functional by 
kinetic and spectroscopic analysis, and by their ability to transport 
protons upon absorption of light. The spin-lattice relaxation time 
of the unpaired electron in each of the nitroxyl radicals was moni- 
tored indirectly by determining P,,, the continuous wave power at 
which the amplitude of the signal from the central absorption in 
the spectrum (Figure 12-35) was 50% of its absorption in the 
absence of saturation. The change in PD. (AP,,) for the respective 
nitroxyl radical at the noted position in the sequence in the pres- 
ence, relative to the absence, of (A) molecular oxygen (O,) and 
(B) chromium oxalate is presented. Measurements were made of 
native protein in the original reconstituted vesicles (@) or the vesi- 
cles dissolved in aqueous solutions 1% (A) or 10% (©) in octyl glu- 
coside. Compare the patterns of exposure to molecular oxygen (A) 
and chelated paramagnetic ion (B) to the crystallographic molec- 
ular model of the protein (Figure 14-14). Reprinted with permis- 
sion from ref 457. Copyright 1990 American Association for the 
Advancement of Science. 


consecutive site-directed mutation of a membrane- 
spanning a helix in lactose permease from E coli has 
also suggested that the surface of the a helix facing the 
lipid is more tolerant to changes in the size of its side 
chains than the surface facing the interior of the protein, 
as long as the side chains remain hydrophobic.“™ 
Consequently, when the frequency at which amino acids 
are substituted is examined as a function of their position 
in a set of aligned amino acid sequences of the same or a 
set of related integral membrane-bound proteins from 
different species of organisms, that frequency has also 
been observed to vary along segments that are mem- 
brane-spanning «helices with a period of about 
3.6 aa,“® just as the exposure of a spin label to molecular 
oxygen varies (Figure 14-27). Those positions facing the 
interior of the protein are positions in which substitu- 
tions over time are less tolerated. 

It is also possible to use spin-labeled phospho- 
lipids to determine whether or not a portion of mem- 
brane-bound protein is inserted into the bilayer of 
phospholipid and how deeply it sits within it. 
Phospholipids have been synthesized with a 3-oxa- 
1-oxyl-2,2,5,5-tetramethylpyrrolinyl group covalently 
attached to their head group and to carbon 5, 10, 12, and 
14, respectively, of one of their fatty acyl chains (see 
14-16). It has been well established that the fatty acyl 
groups bearing these nitroxyl radicals remain fully 
extended so each sits at a characteristic depth in the 
membrane, and the nitroxyl radical on the head group 
sits at the surface of the membrane. A 3-oxa-1-oxyl- 
2,2,5,5-tetramethylpyrrolinyl group will quench a nearby 
fluorescent chromophore attached to a protein. By com- 
paring the degree of quenching of the fluorophore as a 
function of the mole fraction of the nitroxylphospholipid 
in the membrane for nitroxyl radicals covalently 
attached at different positions on the labeled phospho- 
lipid, the depth at which the fluorophore sits in the 
bilayer of phospholipid can be estimated.*"“ For exam- 
ple, it was estimated that a 3-(2’,2’-dimethylnaphth- 
7’-yl)-3-oxopropyl group covalently attached as a fluo- 
rophore to the sulfur of a cysteine placed by site-directed 
mutation at position 81 in the amino acid sequence of 
cysteineless cholesterol oxidase from Brevibacterium 
sterolicum sits 0.8 + 0.3 nm from the center of the bilayer 
when the native protein is inserted in the membrane.”®” 

Although these spectroscopic methods have been 
applied to several integral membrane-bound proteins, 
the majority of the assignments of membrane-spanning 
a helices in proteins for which crystallographic molecu- 
lar models are unavailable have been made from genetic 
and chemical observations. 

The general problem of identifying within the 
amino acid sequence of an integral membrane-bound 
protein, for which a crystallographic molecular model is 
unavailable, those segments of greater than 20 aa in 
length that span the bilayer of phospholipid as o helices, 
rather than simply spanning the interior of the globular 
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protein on either side of the bilayer of phospholipid, is 
supposed to have both a computational solution and an 
experimental solution. The computational solu- 
tion?0%468-470 is based on one or the other of the scales of 
numerical values for the hydropathies of the amino 
acids. The values from one of these scales or differences 
between the values from two of the scales*”’ are averaged 
over the amino acid sequence of a given segment. If the 
mean numerical value of the average for the 21 aa in a 
given segment is greater than a certain magnitude, where 
hydrophobic amino acids have positive values and 
hydrophilic amino acids have negative values on the 
scale chosen, then there is a high probability that that 
segment spans the membrane as an o helix within the 
native structure of the folded polypeptide. For example, 
for the scale of Kyte and Doolittle, if the average numer- 
ical value of the hydropathy is greater than +1.6, the seg- 
ment of 21 aa probably spans the membrane.*°**”’ In the 
case of the criterion of White and Wimley,’” it is the dif- 
ference between the hydropathy displayed by an amino 
acid in a peptide during its partition into an isotropic 
organic phase (octanol) and the hydropathy displayed 
during partition onto the surface as opposed to the inte- 
rior of a bilayer of phospholipid. 

These computational approaches to designating 
the membrane-spanning ahelices are based on the 
assumptions that the change in free energy for the inser- 
tion of an g helix into a 3.6 nm layer of hydrocarbon from 
an aqueous phase is directly related to the partition of 
model solutes for its amino acid side chains and peptide 
bonds between water and an isotropic phase of hydro- 
carbon;****” that scales of hydropathy regardless of their 
origin ultimately reflect the free energy of this partition; 
that the hydrocarbon of the bilayer of phospholipid is 
more nonpolar than the interior of any protein; and that 
a longer stretch of polypeptide, 24 amino acids, is 
required to span a bilayer than is required to span the 
interior of a molecule of protein.” 

The 11 amino acid sequences known to span the 
membrane in photosynthetic reaction center from 
R. viridis (Figure 14-17) were designated as the only 
membrane-spanning «helices in its native structure by 
one of these computational algorithms”? before the 
crystallographic molecular model became available, and 
every hydrophobic segment of the amino acid sequence 
of this protein ultimately observed to span the mem- 
brane had been so designated. This result might be taken 
as an indication of the reliability of these predictions. In 
another instance, however, an entire set of assignments 
based on mean hydropathy seems to have failed. At least 
one of the computational algorithms designated five of 
the segments of the amino acid sequence of unspecific 
monooxygenase from mammalian liver as membrane- 
spanning œ helices in addition to its amino-terminal 
anchor, which does span the membrane.””! When, how- 
ever, the amino-terminal anchor is removed from the 
protein and several other of its hydrophobic amino acids 
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are mutated to hydrophilic amino acids, the membrane- 
bound protein becomes a water-soluble protein that has 
been Crvstalltzed "71177 From an examination of the 
resulting crystallographic molecular model, it is clear 
that, other than its amino-terminal anchor, mammalian 
unspecific monooxygenase contains no membrane- 
spanning œ helices. In fact, the bacterial form of this 
monooxygenase*”"*’*4 is a normal, water-soluble pro- 
tein the crystallographic molecular model of which is 
superposable on that of the mammalian protein. 

In addition to such overprediction, these algo- 
rithms also miss some membrane-spanning o helices, 
particularly in large integral membrane-bound proteins 
in which several of the o helices that span the membrane 
pass entirely through the center of the protein without 
contacting the hydrocarbon of the bilayer of phospho- 
lipid. For example, of the 56 membrane-spanning 
a helices in the œ dimer of the crystallographic molecu- 
lar model of bovine cytochrome-c oxidase (Table 
14-6),°¥ 18 are not hydrophobic enough to have been 
designated as membrane-spanning, and 12 of those were 
passed over in an assignment of membrane-spanning 
ahelices made before the crystallographic molecular 
model became available.” Most of those that seem to be 
too hydrophilic pass through the center of the protein 
and have few or no contacts with the hydrocarbon of the 
bilayer of phospholipid. 

Both the experience with unspecific monooxyge- 
nase, where hydrophobic segments that do not seem to 
span the bilayer of phospholipid were designated as 
hydrophobic enough to do so, and the experience with 
cytochrome-c oxidase, where hydrophobic segments 
that span the bilayer of phospholipid were not 
hydrophobic enough to be designated as doing so, sug- 
gest that there is no reliable method for making this 
designation by inspection alone. Some integral mem- 
brane-bound proteins may contain a set of membrane- 
spanning segments the hydrophobicities of which are so 
remarkable that they can be designated as traversing the 
bilayer of phospholipid without too much doubt. Other 
integral membrane-bound proteins, however, also con- 
tain another set of membrane-spanning segments that 
are hydrophobic but not so hydrophobic as to be distin- 
guishable from those segments elsewhere within the 
amino sequence that merely span the globular portions 
of the protein on either side of the membrane. In many 
instances, it may be these less hydrophobic membrane- 
spanning segments, impossible to identify by inspection, 
that are most intimately involved with the function of the 
integral membrane-bound protein, particularly if it cat- 
alyzes the transport of a hydrophilic solute across the 
bilayer of phospholipid. This makes their identification 
even more desirable. 

The success with which these various algorithms 
predict membrane-spanning sequences has been 
assessed by comparing their assignments to the actual 
membrane-spanning sequences in the available crystal- 


lographic molecular models of integral membrane- 
bound proteins.*” The accuracy with which they predict 
these direct observations varies from 91% to 99%. The 
most widely used*”* algorithm for predicting the mem- 
brane-spanning o helices of a membrane-bound protein 
of unknown structure, that of Kyte and Doolittle, has an 
accuracy of only 93% and overpredicts membrane-span- 
ning o helices by 13%. This overprediction arises from the 
fact that many o helices that simply span a globular por- 
tion of the protein outside the membrane are hydropho- 
bic enough to be mistakenly assigned as spanning the 
membrane. The most accurate algorithm for predicting 
membrane-spanning o helices (99%) and for avoiding 
overprediction (<1%) was that of Wimley and White.*®° 

The success with which these algorithms can desig- 
nate membrane-spanning o helices was assessed with a 
set of proteins already known to be anchored mem- 
brane-bound proteins or integral membrane-bound pro- 
teins. It is probably the case that the rate of success is 
much lower when one of them is asked to determine 
whether or not a protein is membrane-bound in the 
absence of any information other than its sequence, a 
purpose for which these algorithms are often used. This 
ambiguity arises because many proteins that are not 
bound to membranes have long sequences buried in 
their interior that are hydrophobic enough to appear to 
be membrane-spanning o helices. Such decisions about 
whether or not a protein is membrane-bound should be 
viewed with caution. 

In spite of the success of the algorithm of Wimley 
and White, it is usually assumed that the segments iden- 
tified computationally as membrane-spanning o helices 
should be considered only as candidates for spanning 
the membrane and that whether or not they do span the 
membrane should be validated experimentally. 

The most direct experimental solution to the prob- 
lem of identifying a membrane-spanning segment relies 
upon covalent modification of the protein from within 
the liquid hydrocarbon of the bilayer of phospholipid. 
Because only poorly nucleophilic amino acids are found 
within membrane-spanning segments, nitrenes or car- 
benes have been universally used as reagents for their 
selective modification. The precursor of a nitrene or car- 
bene is incorporated into a hydrophobic molecule that 
partitions almost exclusively into the hydrocarbon of the 
bilayer of amphipathic lipids surrounding the mem- 
brane-spanning segments of the protein. The nitrene or 
carbene is generated from the precursor by photolysis, 
and it inserts into the membrane-spanning segments of 
the polypeptide, albeit in low yield. The intact polypep- 
tides that are susceptible to the modification can be 
identified by electrophoresis in solutions of dodecyl sul- 
Dote DT the regions of the polypeptides that have been 
modified can be identified by isolating and identifying 
peptides containing them,*°“* and the particular 
amino acids modified can be identified by submitting the 
peptides to sequencing.“ *° 


The reagents that have been used are precursors of 
carbenes or nitrenes attached to two different types of 
hydrophobic carriers. 1-Tritiospiroladamantane- 
4,3’-diazirine] (14-17), 5-['”Tiodonaphthyl azide 
(14-18),** 3- (trifluoromethyl) -3-(m- En iodopheny]) diazi- 
rine (14-19),"” and 1-azidopyrene (14-20) are examples 
of hydrophobic solutes that can diffuse freely through 
the liquid hydrocarbon of the bilayer of phospholipid: 
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Precursors of carbenes” 849 and nitrenes®”” have 


also been incorporated covalently into phospholipids 
that can then be incorporated into the bilayers of phos- 
pholipids surrounding membrane-bound proteins. An 
example would be diazirinylphospholipid 14-21: 


®N(CH3)3 


14-21 


In these derivatives of phospholipids, the precursor of 
the carbene or the nitrene can be incorporated into the 
fatty acyl chains, as in diazirinylphospholipid 14-21, or it 
can be incorporated into the hydrophilic functional 
group esterified to the phosphate of the phospholipid.*”” 
In the former case, amino acids within the hydrocarbon 
are the targets of the modification; and, in the latter case, 
amino acids at the two ends of the membrane-spanning 
segments." 

Originally it was thought that, by varying the posi- 
tion along the hydrocarbon of the fatty acid at which the 
carbene was located within a phospholipid, amino acids 
within the membrane-spanning segment located at dif- 
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ferent depths within the bilayer of phospholipid could be 
distinguished. Unfortunately, carbenes show a signifi- 
cant preference for insertion into nitrogen-hydrogen, 
oxygen-hydrogen, and sulfur-hydrogen bonds over 
carbon-hydrogen bonds, and this preference usually 
directs all of the carbenes in the liquid hydrocarbon, 
regardless of their mean depth in the bilayer of phospho- 
lipid, to the same one or two most susceptible amino 
acids in each membrane-spanning segment.'”+96 
Therefore, there is no obvious advantage to the deriva- 
tives of the phospholipids over the simpler hydrophobic 
precursors other than their elegance. 

Often the incorporation observed with such 
hydrophobic reagents is consistent with the identifica- 
tion of membrane-spanning segments based on their 
mean hydropathy. Both glycophorin,*™ which spans the 
membrane once, presumably with its only hydrophobic 
segment, and subunit IV of cytochrome-c oxidase,” 
which also contains only one hydrophobic segment 
greater than 20 aa in length, which was later observed to 
span the membrane in the crystallographic molecular 
model,” have been modified by either a carbene or a 
nitrene, respectively, incorporated into a phospholipid. 
The large majority of the incorporation on each case was 
located in a peptide 68 or 49 amino acids in length, 
respectively, that contained the hydrophobic segment of 
greater than 20 aa picked out by the computational algo- 
rithms. When (iodophenyl)diazirine (14-19) was used to 
modify bacteriorhodopsin, incorporation also was found 
to occur™ in a region of the amino acid sequence con- 
taining a segment that had been identified computation- 
ally as spanning the membrane and that was later shown 
to be in the center of a membrane-spanning o helix in 
the crystallographic molecular model.” 

All five of the subunits of acetylcholine receptor 
(Figure 14-22) are homologous in amino acid sequence, 
and each contains four segments of amino acid sequence 
that were judged to be hydrophobic enough to span the 
membrane. Both 1-azidopyrene and 3- (trifluoromethyl) - 
3-(m-['**Iiodophenyl)diazirine label all of the subunits 
in the native protein in its native membrane. A set of 
labeled peptides, identified by their sequences, has been 
isolated from digests of the labeled protein. Among the 
members of this set are contained the first, the third, and 
the fourth of the hydrophobic segments from one or the 
other of the subunits.*”° In one of these peptides, two cys- 
teines were identified as the sites of modification, further 
evidence for the electrophilicity of nitrenes and car- 
benes. In the molecular model derived from electron dif- 
fraction and image reconstruction,’ all 20 of the 
hydrophobic segments, four from each subunit of the 
protein, are membrane-spanning o helices. 

When adamantyldiazirine (14-17) was used to 
modify canine Na‘/K*-exchanging ATPase, however, and 
the long tryptic peptides of the intact polypeptide that 
were modified by the reagent were isolated and identi- 
fied, ID it was found that substantial amounts of 
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adamantylidene had been incorporated into three tryp- 
tic peptides that did not contain hydrophobic segments 
designated computationally as membrane-spanning. 
These three peptides contained segments greater than 
20 aa in length that were hydrophobic but not suffi- 
ciently hydrophobic to be picked out by their mean 
hydropathy. Their sequences are -VNFPVENL 
CFVGFISMIGPP-, -QIGMIQALGGFFTYFVILAE-, and 
-PTWWFCAFPYSLLIFVYDEV-. The location of these 
three segments in the structure of Na'/K'-exchanging 
ATPase can be inferred from the location of the homolo- 
gous sequences in the crystallographic molecular model 
of Ca”*-transporting ATPase from O. cuniculus (Figure 
14-15).°°! The first is located in the globular cytoplasmic 
domain of the protein distant from the bilayer of phos- 
pholipid, but the latter two are in membrane-spanning 
ahelices, in spite of their low mean hydropathy. Both of 
these membrane-spanning «helices are on the outside 
surface of the bundle of a helices within the bilayer of 
phospholipid (Figure 14-15). Three of the segments of 
the amino acid sequence of Na'/K'-exchanging ATPase 
designated as membrane-spanning by their high mean 
hydropathy were found within other tryptic peptides 
modified by adamantyldiazirine, and the homologues of 
these three segments do also span the membrane in the 
crystallographic molecular model of Ca’*-transporting 
ATPase. 

The topography of a membrane-spanning protein 
is a complete designation of those segments of its 
polypeptide that span the membrane and of the sides of 
the membrane, cytoplasmic or extracytoplasmic, on 
which those segments of its polypeptide that are not 
within the hydrocarbon of the bilayer of phospholipid 
are located. The topography of a membrane-spanning 
protein can be defined by identifying in turn the location 
of one or more of the amino acids within each of the seg- 
ments of its polypeptide outside the membrane. While 
assembling these individual topographic assignments, if 
one end of a segment of hydrophobic amino acids in the 
sequence of an integral membrane-bound protein can 
be shown to be located on its cytoplasmic surface and the 
other end on its extracytoplasmic surface, then it can be 
concluded that that segment spans the bilayer of phos- 
pholipid. 

The identification of the side of a membrane, cyto- 
plasmic or extracytoplasmic, upon which a particular 
amino acid or peptide from the polypeptide of an inte- 
gral membrane-bound protein is located can be made 
with oriented, sealed structures and an impermeant 
reagent. The most reliable oriented, sealed structures 
are intact cells, such as erythrocytes,” or intact 
organelles, such as undamaged mitochondria or lyso- 
somes.” Erythrocytes are ideal for this purpose because 
sealed inside-out vesicles that present only the cytoplas- 
mic surfaces of their membrane-bound proteins to the 
solution can be prepared from intact erythrocytes.’ 
Intact animal cells grown in tissue culture’ or spher- 


oplasts of bacteria* have also been used in such experi- 
ments. Mitochondria as they are usually prepared con- 
tain both an outer membrane, which is the porous 
cellular membrane isolating these organelles from direct 
contact with the cytoplasm, and an inner membrane, 
which is the tight, impermeable boundary of the func- 
tional mitochondrion. The outer membrane can be 
removed?” to produce sealed, unwrapped mitochondria 
that present the cytoplasmic, extramitochondrial sur- 
faces of their membrane-bound proteins to the external 
solution.” Sealed, inside-out vesicles, which present the 
extracytoplasmic, intramitochondrial surfaces of their 
membrane-bound proteins to the solution, can be pre- 
pared from unwrapped mitochondria.” 

Sealed vesicles often form spontaneously from 
fragments of the constituent membranes during 
homogenization of a tissue. As these structures are 
adventitious, they are not necessarily sealed to all 
hydrophilic solutes. For example, vesicles of plasma 
membrane can be isolated from the electric organ of 
T. californica that are sealed to large solutes such as 
proteins,’ but only a minority of them are sealed to 
small solutes such as the cations of alkali metals.” 
Occasionally, however, a suspension of homogeneously 
and tightly sealed vesicles, in which all of the proteins 
are oriented as they were when the membrane contain- 
ing them was in the cell, can be prepared from a 
homogenate.” 0" 

It is also possible to use purified membrane- 
bound proteins reconstituted into sealed vesicles of 
phospholipid. It is usually the case that during a recon- 
stitution the membrane-bound protein inserts at 
random in either of the two possible orientations, cyto- 
plasmic surface directed outward or extracytoplasmic 
surface directed outward. If only one of these two sur- 
faces is susceptible to endopeptidolytic digestion when 
the protein is in its native structure, as is often the case, 
digestion of the reconstituted vesicles will nick only 
those molecules of protein exposing that surface, and 
intact polypeptides, derived exclusively from molecules 
of protein inserted in the opposite orientation, can be 
purified by electrophoresis or molecular exclusion chro- 
matography performed in solutions of dodecyl sul- 
fate 512518 

Many impermeant reagents have been used to 
modify integral membrane-bound proteins in such 
sealed, impermeable structures. Because a bilayer of 
phospholipid contains a continuous sheet of hydro- 
carbon 3.6 nm wide, charged solutes or solutes with large 
numbers of donors and acceptors for hydrogen bonds 
cannot pass through it. An impermeant reagent for 
covalent modification is such a hydrophilic solute that 
also contains an electrophilic functional group appro- 


* A spheroplast is a bacterial cell that has been stripped of its outer 
membrane. 


priate for the modification of proteins. Diazotized 
p-P°SIsulfanilic acid (14-22),?'* N-formy]-[*°S]sulfinyl- 
methionyl methylphosphate (14-23), isethionyl 
[“CJacetimidate (14-24),°'° 2-S-[*C]thiuroniumethane- 
sulfonate (14-25),°'” 
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and pyridoxal phosphate*'® and sodium borohydride 


(Figure 10-3) are impermeant reagents that have been 
used to modify only the surface of a protein presented to 
the external solution in a suspension of sealed mem- 
branes. 

For example, both intact bovine mitochondria and 
sealed inside-out vesicles of bovine mitochondria were 
separately modified with pyridoxal phosphate and 
sodium borohydride, and labeled ADP, ATP carrier, an 
integral membrane-bound protein, was purified from 
each sample. Thermolytic peptides containing the 
labeled lysines were isolated from the protein and iden- 
tified by their amino-terminal sequences. It was found 
that Lysine 146 of the protein was modified by pyridoxal 
phosphate and sodium borohydride in the protein from 
the sealed inside-out vesicles, while Lysines 95, 198, 205, 
259, and 267 were modified by pyridoxal phosphate and 
sodium borohydride in the protein from intact mito- 
chondria.°’ Lysine 146 was assigned to the extracyto- 
plasmic, intramitochondrial surface of the protein; and 
the other lysines, to its cytoplasmic, intramitochondrial 
surface. 

The tryptic peptide HLLVMKGAPER, the amino 
acid sequence containing Lysine 501 from ovine Na IK, 
exchanging ATPase, could be isolated from digests of the 
intact protein by immunoadsorption with immuno- 
globulins G directed against its carboxy terminus. Lysine 
501 would not incorporate pyridoxal phosphate when 
the protein was in sealed vesicles that presented only the 
extracytoplasmic surface of the protein to the solution 
but readily incorporated pyridoxal phosphate when the 
vesicles were opened by adding the surfactant 
saponin.” Saponin, by combining with the cholesterol 
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in the bilayer of phospholipid, is able to form large 
(8.0 nm) holes in natural membranes” without signifi- 
cantly altering the membrane-bound proteins. The 
results of these experiments demonstrated that Lysine 
501 is located on the cytoplasmic surface of Na IK, 
exchanging ATPase, a topographical assignment later 
verified crystallographically.”°' 

One of the drawbacks of labeling integral mem- 
brane-bound proteins with such impermeant elec- 
trophiles is that the shorter segments between two 
candidates for spanning the membrane often lack a suit- 
able nucleophilic side chain. For example, in the MotA 
protein from E. coli, the short segment connecting the 
first and second candidates for spanning the membrane 
contains only a tyrosine, and the short segment connect- 
ing the third and the fourth contains only a glutamate, 
two nucleophiles that can be difficult to label. 
Consequently, cysteines were placed in turn within these 
segments at position 24, 190, and 196 in the amino acid 
sequence of the protein, respectively, in three separate 
site-directed mutations of the cysteineless version of 
the protein. The modification of each of these cysteines 
by the impermeant fluorescent reagent fluorescein 
5-maleimide proceeded at similar rates whether the 
mutant protein was in intact spheroplasts of the bacteria 
or osmotically disrupted spheroplasts. Six other cysteines 
placed in the much larger segments connecting the 
second and the third candidate and following the fourth 
candidate for spanning the membrane reacted slowly 
with the fluorescein 5-maleimide in the intact sphero- 
plasts and rapidly in the disrupted spheroplasts. These 
results demonstrated that positions 24, 190, and 196 are 
located in extracytoplasmic segments and the rest of the 
connecting segments are cytoplasmic, and that all four of 
the candidates for spanning the membrane do so. 

Similar experiments have been performed on seg- 
ments of the amino acid sequence of several other inte- 
gral membrane-bound proteins but in these instances in 
order to assess variations from position to position in 
accessibility to the aqueous phase rather than topogra- 
phy. Each amino acid in a particular segment was 
mutated in turn to a cysteine, and the susceptibility of 
that cysteine to modification by a polar electrophile was 
assayed.” Variations in accessibility provided infor- 
mation about the structure of the polypeptide within that 
segment in the native state of the protein. 

Enzymes can also be used as impermeant reagents. 
For example, the impermeant enzyme protein- 
glutamine y-glutamyltransferase (Equation 13-45) has 
been used to catalyze the modification of exposed gluta- 
mines on the surfaces of membrane-bound proteins with 
fluorescent primary amines.” The enzyme lactoperoxi- 
dase (Equation 10-33) has also been used as an imper- 
meant reagent.” Although this enzyme probably 
produces a small, diffusible, activated form of iodine, 
perhaps IOH, that species is so reactive that it never 
makes it across a bilayer of biological phospholipids.°” 
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Endopeptidases or immunoglobulins G are also 
impermeant reagents. Each of the endopeptidases 
pronase,’ chymotrypsin,”’ and papain”? is able to 
cleave native, human band3 anion transport protein 
(n.a=911),” an integral membrane-spanning protein in 
human erythrocytes, within the short segment of the 
polypeptide, —-QDHPLQKTYNYNVLMVPKPWQGPLP.-, 
between Glutamine 545 and Proline 568. Papain cleaves 
after Glutamine 550, and chymotrypsin cleaves after 
Tyrosine 553.°°! These cleavages occur quantitatively 
when any one of the endopeptidases is added to the 
extracytoplasmic solution in which intact erythrocytes 
are suspended. These results demonstrate that this short 
segment of amino acids is fully exposed on the extracy- 
toplasmic surface of the intact protein. Two different 
monoclonal immunoglobulins G raised against purified 
native acetylcholine receptor from the electric organ of 
T. californica recognize as an antigen the synthetic pep- 
tide KAEEYILKKPRSELMFEEQ, which is an amino acid 
sequence from the interior of one of the five homologous 
polypeptides composing the protein. Presumably, the 
natural epitopes on the intact protein are composed of 
sequences from this region. It could be shown that these 
monoclonal immunoglobulins G were bound only at the 
cytoplasmic surfaces of membranes containing this pro- 
tein.”- From this result, it was concluded that this 
sequence in the native structure of acetylcholine recep- 
tor is exposed on the cytoplasmic surface of the protein, 
a topographical assignment later validated by the crys- 
tallographic molecular model derived from electron dif- 
fraction and image reconstruction.*”° 

Because oligosaccharides are added to all glycopro- 
teins in the extracytoplasmic lumina of the Golgi mem- 
branes, any asparagine, serine, or threonine in an 
integral membrane-bound protein that is glycosylated 
must be located on its extracytoplasmic surface. The 
asymmetry of the biosynthesis of these oligosaccharides 
can also be used to make topographical assignments for 
segments in the amino acid sequence of a protein that 
are not normally glycosylated. If a sequence encoding 
glycosylation, for example, -NST-,”* is introduced by 
site-directed mutation into a segment of amino acid 
sequence between two candidates for spanning the 
membrane found in an integral protein located in the 
plasma membrane, the glycosylation of the new 
asparagine in that sequence demonstrates that the seg- 
ment is located on the extracytoplasmic surface of the 
protein.”* Unfortunately, the steric requirements for 
access of the asparagine during its glycosylation are fairly 
stringent,” so no conclusion can be made from a nega- 
tive result. In addition, this approach has been observed 
to give misleading assignments.”*°**’ 

It is also possible to insert by genetic manipulation 
entire enzymes into a segment in the polypeptide of an 
integral membrane-bound protein located between two 
hydrophobic segments and use the activity of that enzyme 
to identify the location of the modified segment. For exam- 


ple, alkaline phosphatase has been inserted consecutively 
into the hydrophilic segments between the 12 candidates 
for spanning the membrane in lactose permease from 
E coli” and the six candidates for spanning the mem- 
brane in MalG protein from E coli,” and the intact cells 
bearing each of these constructs were assayed for alkaline 
phosphatase activity with the impermeant reactant 
p-nitrophenyl phosphate. In both cases, alkaline phos- 
phatase inserted into every other hydrophilic segment was 
located extracytoplasmically, results demonstrating that 
every candidate in each protein actually does span the 
membrane. Both f-lactamase™™’ and chloramphenicol 
O-acetyltransferase*” have also been used in this way. The 
principal difficulty with these approaches is that the inser- 
tion of an entire molecule of protein into a short segment 
between two membrane-spanning segments could dis- 
rupt the normal topography of the protein. One way to 
minimize this problem is to insert the enzyme at random 
and select for modified proteins that retain their full bio- 
logical activity in addition to having the extraneous 
enzymes inserted into the desired segments.°” 

Another drawback of all of the approaches for 
assessing topography that involve large numbers of site- 
directed mutations, such as the insertion of cysteines at 
consecutive positions in a cysteineless version of the pro- 
tein or the fusion of an integral membrane-bound pro- 
tein with another protein, is that there must be an 
efficient expression system for that integral membrane- 
bound protein. This requirement usually confines their 
use to bacteria because there are few convenient, effi- 
cient expression systems for most integral membrane- 
bound proteins from animals that produce high yields of 
the protein. For example, methods that rely on the pro- 
duction of fusion proteins have been applied almost 
exclusively to bacteria. 

This lack of efficient expression systems is a 
common problem with evaluating the results of any site- 
directed mutation of membrane-bound proteins from 
animals. Occasionally one of these proteins can be 
expressed in a functional form in E. coli,“ but usually 
they must be expressed in cultured cell lines derived 
from animal tissue. Although the product of a site- 
directed mutation can often be detected immunochem- 
ically,” its expression in animal cells usually precludes 
the production of sufficient quantities of a mutant for 
purification and direct study. Rather, the structural 
changes resulting from a mutation are inferred from 
changes in the function of the protein in intact cells*“°°”” 
or crude preparations of membranes from the cells. An 
extreme example is the expression in oocytes of X. laevis 
of ionic channels from nervous tissue or transport pro- 
teins for metabolites such as glucose, which can be 
detected only by the robust fluxes of the substrates that 
they produce across the membrane 21271 It is also 
possible to detect expressed channels for water in these 
oocytes again because of the high water permeability 
they create.” 


The extensive topographical experiments that 
have been performed on human band 3 anion transport 
protein from erythrocytes serve as an example of the 
cumulative application of these strategies to one integral 
membrane-bound protein. Band 3 anion transport pro- 
tein is an integral membrane-bound protein in the 
plasma membrane of the erythrocyte that is responsible 
for the transport of anions such as chloride, bicarbonate, 
or phosphate*” across the membrane, and it is the inte- 
gral membrane-bound protein present in the highest 
concentration in this plasma membrane. The human 
protein is composed of a single polypeptide 911 amino 
acids long that bears covalently attached carbohydrate”? 
and spans the bilayer of phospholipid in its native struc- 
ture.'”' Human band 3 anion transport protein has a 
detachable domain on the cytoplasmic side of the bilayer 
of phospholipid that can be released by trypsin from 
fragments of membrane*™ and that constitutes the first 
360 amino acids from the amino terminus of the folded 
polypeptide.’ After cleavage from the membrane, the 
detached domain is freely water-soluble. 

The domain 550 amino acids long, left in the mem- 
brane after the amino-terminal domain has been 
detached by endopeptidolytic digestion and washed 
away, is still able to transport anions as rapidly as does 
the intact protein.” Its amino acid sequence contains at 
least 10 individual hydrophobic segments, 20 aa or more 
in length, that are candidates for spanning the mem- 
brane (Figure 14-28).°°55 The formal amino terminus of 
this embedded domain in the human protein,” Glycine 
361, must be on the cytoplasmic surface of the protein 
because cleavage by trypsin at the œ amide of Glycine 361 
to release the detachable domain occurs at that surface. 

Lysine 430 in human band 3 anion transport pro- 
tein can be modified from the extracytoplasmic surface 
by formylation followed by reduction with sodium boro- 
hydride, which, under the appropriate conditions, is 
impermeant to intact erythrocytes,” so the hydropho- 
bic segment between Glutamine 404 and Glycine 428 
(segment a in Figure 14-28) spans the membrane. 
Tyrosine 486 is modified extensively by lactoperoxidase 
and [PIT when the protein is in inside-out vesicles 
made from erythrocytes but only weakly when it is in 
intact cells, a result that places Tyrosine 486 on the cyto- 
plasmic surface of the membrane. Consequently, the 
hydrophobic segment between Methionine 435 and 
Phenylalanine 478 (segment b in Figure 14-28) spans the 
membrane.” In the region between Lysine 542 and 
Proline 568 (between segments d and e in Figure 14-28) 
there is a loop of polypeptide in the native structure of 
the protein that is susceptible to cleavage by pronase, 
chymotrypsin,” pepsin A,” and papain” but only 
from the extracytoplasmic surface of the membrane. 
Consequently, the polypeptide spans the membrane 
once between Tyrosine 486 and Lysine 542. There are 
two hydrophobic segments greater than 20 aa in length 
in this region (segments c and d in Figure 14-28). 
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Tyrosine 596 is not iodinated by lactoperoxidase 
and PIT when the lactoperoxidase is present only at 
the extracytoplasmic side of the membrane, and 
Asparagine 593 does not bear an N-linked oligosaccha- 
ride, even though it is in the proper sequence to be so 
modified. These negative results suggest that this part of 
the amino acid sequence is on the cytoplasmic surface of 
the membrane and that the hydrophobic segment 
between Asparagine 569 and Arginine 589 (segment e in 
Figure 14-28) spans the membrane.’”’ 

Tyrosine 628 is iodinated in the presence of extra- 
cytoplasmic lactoperoxidase and ['“I]I,°”’ the peptide 
bond between Threonine 629 and Glutamine 630 is 
cleaved by extracytoplasmic papain°° in intact erythro- 
cytes, and Asparagine 642 is glycosylated,’ so the 
hydrophobic segment between Lysine 600 and Aspartate 
621 (segment fin Figure 14-28) spans the membrane. 
Pyridoxal phosphate and Na[*H]BH, can modify Lysine 
691 when band 3 anion transport protein is in inside-out 
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Figure 14-28: Plot of the distribution of hydropathy over the 
amino acid sequence of murine band 3 anion carrier (na = 929). 
Each amino acid in the sequence of amino acids is assigned its 
numerical value of hydropathy in the scale of Kyte and Doolittle.” 
A moving average with a span of 11 positions is calculated from this 
sequence of numbers. The numerical value of mean hydropathy 
assigned to each position in the amino acid sequence is the aver- 
age of the segment of the 11 positions of which it is the central 
number. Positive values indicate hydrophobic locations; negative 
values, hydrophilic locations. Ten long hydrophobic segments are 
found within the last 530 amino acids of the sequence (a-j). 
Segments b, i, and j are long enough (>40 amino acids) to contain 
two membrane-spanning o helices. The amino-terminal cytoplas- 
mic domain in the murine protein is 19 amino acids longer than the 
one in the human protein, so each potential membrane-spanning 
segment is 19 amino acids farther along in the sequence of the 
murine protein. The human protein and the murine protein, how- 
ever, can be aligned with 92% identity and a gap of only one amino 
acid over the last 530 amino acids. The hydrophilic segments in the 
amino acid sequence of the murine protein that have been 
assigned to the cytoplasmic (c) surface or extracytoplasmic (e) sur- 
face of the human protein by chemical experiments are designated. 
Adapted with permission from Nature, ref 555. Copyright 1985 
Macmillan Magazines Ltd. 
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vesicles but not when it is in intact erythrocytes, so the 
hydrophobic segment between Proline 660 and 
Glutamate 681 (segment g in Figure 14-28) spans the 
membrane.” Trypsin can cleave the polypeptide in the 
native protein at Lysine 743, but only from the cytoplas- 
mic surface,” so the hydrophobic segment between 
Lysine 698 and Proline 722 (segment h in Figure 14-28) 
does not span the membrane. Aspartate 821 is modified 
with ` 1-ethyl-3-[3-(trimethylammonio)propyl]carbodi- 
imide and [”S]sulfanilic acid (Figure 10-5) when the pro- 
tein is in inside-out vesicles but not when it is in intact 
erythrocytes,” so the long hydrophobic segment 
between Arginine 760 and Arginine 808 (segment i in 
Figure 14-28) spans the membrane either twice or not at 
all. The carboxy terminus of anion carrier when it is in 
inside-out vesicles but not when it is in right-side-out 
vesicles can bind an immunoglobulin raised against its 
amino acid sequence,” so the long hydrophobic seg- 
ment between Lysine 829 and Arginine 870 (segment j in 
Figure 14-28) spans the membrane either twice or not at 
all. 

One measure of the success of such topographical 
assignments is to compare the conclusions reached 
experimentally with crystallographic molecular models 
of the protein that become available at a later date. For 
example, the topographies of two of the subunits of 
bovine cytochrome-c oxidase were examined before its 
crystallographic molecular model (Table 14-6) became 
available. Chymotrypsin cleaves the polypeptide of sub- 
unit II in a region between Tryptophan 34 and 
Phenylalanine 37 as well as in a region between 
Tryptophan 99 and Tryptophan 116 when it has access 
only to the cytoplasmic, extramitochondrial surface of 
cytochrome-c oxidase in reconstituted vesicles. 
Glutamate 90, within the hydrophobic segment between 
Arginine 79 and Histidine 103 in subunit III, is modified 
by dicyclohexylcarbodiimide,°® which is a hydrophobic 
carbodiimide, and this result placed this hydrophobic 
segment in the bilayer of phospholipid. The peptide 
bond of Lysine 7 of subunit IV was susceptible to cleav- 
age when intact bovine cytochrome-c oxidase was 
digested with trypsin from its extracytoplasmic, intrami- 
tochondrial surface in inside-out vesicles of mitochon- 
dria% and only when sealed vesicles, in which the 
protein is oriented with its cytoplasmic, extramitochon- 
drial surface outward, were opened with nonionic deter- 
gent.” The polypeptide of subunit IV was also 
susceptible to digestion by pronase in intact, unwrapped 
mitochondria,” which expose only the cytoplasmic, 
extramitochondrial surface of cytochrome-c oxidase. 
Both of these results taken together demonstrated that 
subunit IV does span the membrane in cytochrome-c 
oxidase, with its amino terminus on the extracytoplasmic 
side. All of these conclusions were validated by the crys- 
tallographic molecular model. 

Each of the four homologous subunits of acetyl- 
choline receptor from T. californica contains a cystine 


connecting the cysteines homologous to Cysteine 128 
and Cysteine 142 in the a subunit.””' In each subunit, a 
glycosylated asparagine precedes the respective cysteine 
homologous to Cysteine 142 in the &subunit.’”' Lysine 
165 from the B subunit is accessible to modification by 
pyridoxal phosphate and Na[*H]BH, on the extracyto- 
plasmic surface of sealed right-side-out vesicles derived 
from plasma membranes from the electric organ of 
T. californica.°° All of these facts placed the hydrophilic 
portions containing the first 210 aa in each of the four 
subunits on the extracytoplasmic side ofthe membrane. 
It has already been noted that an immunoglobulin rec- 
ognizing the sequence from Lysine 360 to Glutamine 378 
in the ysubunit is recognized by an immunoglobulin at 
the cytoplasmic surface of the protein. Lysine 380 from 
the æ subunit is accessible to modification by pyridoxal 
phosphate and Na[*H]BH, only when sealed right-side- 
out vesicles derived from plasma membranes of electric 
organ from T. californica are opened with saponin, but 
Lysine 486 from the ysubunit is accessible on the extra- 
cytoplasmic surface of the same vesicles.*!! These results 
placed the former lysine on the cytoplasmic side and the 
latter on the extracytoplasmic side of the membrane. All 
of these results have been validated by the crystallo- 
graphic molecular model of this protein derived from 
electron diffraction and image reconstruction.*” 

There are seven integral membrane-bound pro- 
teins that catalyze the active transport of inorganic 
cations across cellular membranes at the expense of the 
hydrolysis of MgATP. These are Na‘/K‘-exchanging 
ATPase (Na’/K'-ATPase) from animal plasma mem- 
branes, Ca°*-transporting ATPase (ER Ca**-ATPase) from 
animal endoplasmic reticulum, calmodulin-regulated 
Ca**-transporting ATPase from animal plasma mem- 
branes, H*/K'-exchanging ATPase from the luminal 
plasma membranes of gastric mucosa, K*-transporting 
ATPase from bacterial plasma membranes, H*-exchang- 
ing ATPase from fungal plasma membranes, and 
H'-exchanging ATPase from plant plasma membranes. 
Each of these seven proteins has a long polypeptide, 
designated the o polypeptide, which when in its native 
state is responsible for the catalysis of the respective 
active transport. All of the seven a polypeptides are 
homologous in sequence,” >”! and therefore each folds 
to create an asubunit that is superposable upon the 
native structure of all of the others. 

Tryptic cleavages of the o subunit of canine Na IK, 
ATPase in the native membrane that occur at Arginine 262 
and Lysine 30 can take place only when trypsin has access 
to the cytoplasmic surface of the protein.’ Trypsin is able 
to cleave ER Ca**-ATPase of O. cuniculus from the cyto- 
plasmic surface of intact endoplasmic reticulum,’ at 
Arginine 198.” Aspartate 369 is located in the active site 
of ovine Na*/K*-ATPase’”° on its cytoplasmic surface. 
Lysine 766, Lysine 943, and Lysine 1012 in the a subunit 
of ovine Na*/K*-ATPase can be modified with pyridoxal 
phosphate and Na[*H]BH, when they are in sealed right- 


side-out vesicles of plasma membrane only if those vesi- 
cles are opened with saponin. TP"? results that placed 
these amino acids on the cytoplasmic surface of the mem- 
brane. When Cu” is added to sealed right-side-out vesi- 
cles of plasma membrane, it catalyzes the oxidative 
cleavage of the o subunit of porcine Na*/K*-ATPase in the 
presence of ascorbate and H,O, within the segment 
between Tyrosine 895 and Lysine 905 and within the seg- 
ment between Proline 965 and Threonine 979,’ results 
that placed these amino acids on the extracytoplasmic 
surface of the membrane. Immunoglobulins directed 
against the amino terminus and immunoglobulins 
directed against the carboxy terminus of H*-exporting 
ATPase from Neurospora crassa were bound only to the 
cytoplasmic surface of plasma membranes, and the 
amino terminus and carboxy terminus of the same pro- 
tein were removed by trypsin only when it had access to 
the cytoplasmic surface of plasma membranes.” The 
domain on human calmodulin-regulated Ca**-transport- 
ing ATPase to which calmodulin binds from the cyto- 
plasmic surface of the membrane” is located on the 
carboxy terminus of the protein. All of these topographi- 
cal observations have been validated by the crystallo- 
graphic molecular model of ER Ca" -ATPase 

Once the membrane-spanning o helices in an inte- 
gral membrane-bound protein have been identified, it is 
possible to determine how they are arranged within the 
membrane. For example, pairs of cysteines can be 
inserted systematically by site-directed mutation, one 
into each of two membrane-spanning segments in a cys- 
teineless version of an integral membrane-bound pro- 
tein, and those cysteines, if they are adjacent to each 
other in the native structure of the protein, can be cross- 
linked with a hydrophobic cross-linking reagent or 
turned into a cystine by oxidation. Any such covalent 
cross-link between two membrane-spanning æ helices 
places them adjacent to each other in the bundle of 
ahelices within the bilayer of phospholipid.” 
Advantage can also be taken of the favorable free energy 
of formation of a hydrogen bond within the membrane. 
The replacement of a neutral amino acid within the 
membrane with a polar hydrogen-bond donor or accep- 
tor will usually disrupt the structure of the protein, caus- 
ing it to lose its function, but if a polar hydrogen-bond 
acceptor or donor, respectively, is then placed in an adja- 
cent membrane-spanning o helix near enough to the 
polar donor or acceptor to form a hydrogen bond, the 
structure of the protein, and hence its function, can be 
rescued.” The return of function demonstrates that the 
two a helices are adjacent to each other. Unfortunately, 
because of the need for an efficient system for expression 
and the need to score large numbers of mutants, these 
approaches have so far been confined to the analysis of 
integral membrane-bound proteins from bacteria. 

As can be done with any other set of proteins, the 
amino acid sequences of integral membrane-bound pro- 
teins can be aligned, and statistically significant relation- 


The Proteins 803 


ships can be used to identify isoforms of the same pro- 
tein in the same genome” °” or related proteins from 
different species of organisms. Often, similarities in the 
patterns of distribution of the hydrophobic segments 
that are candidates for spanning the membrane 
strengthen the conclusion that all of the aligned proteins 
share a common ancestor.*” One difficulty that arises in 
such alignments of amino acid sequences, however, is 
that, because the choice made by natural selection of the 
side chains to span the membrane is heavily biased in 
favor of the small set of the most hydrophobic, the amino 
acid sequences of unrelated membrane-spanning seg- 
ments often seem to be more closely related to each 
other than unrelated amino acid sequences in general 
do. Nevertheless, in alignments of the amino acid 
sequences of integral membrane-bound proteins that 
are distantly related, the percentage of identity is about 
the same within membrane-spanning sequences as it is 
in sequences outside of the membrane.” 

Almost all integral membrane-bound proteins 
remain in the membrane for their entire lives, but there 
are a set of proteins that begin their lives as water-solu- 
ble proteins, and if they are called upon, end their lives as 
integral membrane-bound proteins. The function of 
most of these proteins is to punch a hole in a membrane, 
either to short-circuit the normal gradients of metabo- 
lites and ions between the cytoplasm and the environ- 
ment and thereby kill the cell or to permit another 
protein with which they are associated to thread its way 
through the hole and enter the foreign cell, usually as an 
act of subterfuge. 

An example of a protein that punches a hole in a 
membrane is a-hemolysin from S. aureus (Table 14-6). 
The protein begins its life as a water-soluble monomer of 
293 aa. To insert into the membrane of the cell it is to kill, 
each monomer puts forth a hairpin of ß structure about 
30 aa in length, and seven of these hairpins*”® assemble 
side by side to form a cylindrical ß barrel that after inser- 
tion spans the membrane of the doomed cell and forms 
the hole®®' through which the ions and metabolites pour. 
The large portion of each monomer on the exterior of the 
membrane associates with its six neighbors to form a 
thick torus of cyclic symmetry of point group 7 (C;) on 
the outside of the cell that resembles the extracellular 
portion of acetylcholine receptor (Figure 14-22). 

Colicin El, which is secreted by E coli, binds to pro- 
teins on the extracytoplasmic surface of cells other than 
E. coli that are to be killed. The portion of the protein 
responsible for forming the hole is a bundle of 10 
a helices in the water-soluble form of the protein.” Two 
of these a helices are completely buried in the center of 
the bundle. They are 16 and 17 amino acids in length and 
are composed entirely of hydrophobic amino acids. In 
addition, the amino acid sequences flanking these 
helices are also composed only of hydrophobic amino 
acids. When this portion of the protein has adsorbed to 
the surface of the bilayer of phospholipid of the mem- 
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brane of the cell that is to be killed, the hairpin of the cen- 
tral two hydrophobic o helices inserts into the hydrocar- 
bon*” and associates with identical hairpins of œ helices 
inserted into the membrane by other molecules of col- 
icin El to form the mortal hole. 

Cholera toxin from V. cholerae,’” heat-labile 
enterotoxin from ` E coli,°° and verotoxin-1 from 
E coli" are homologous heterooligomers that create 
a pore through the membrane through which a 
polypeptide is threaded. Each contain a substructure 
that is a pentamer of identical subunits with cyclic sym- 
metry of point group 5 (C;). These pentameric rings are 
responsible for forming a pore through the membrane 
of the cell that is the target of the toxin. Another, unre- 
lated subunit of the complex that forms the actual toxin 
is then threaded through the pore to intoxicate the cell. 
In the center of each of these pentameric rings is a sym- 
metric ring of five identical œ helices, one from each 
subunit; the five a helices are parallel to each other and 
to the 5-fold rotational axis of symmetry that superposes 
them. The length of these œ helices varies from 11 to 
20 aa, depending on the protein, but each is amphi- 
pathic.”®”® The ring of five subunits recognizes the 
oligosaccharide on a particular glycolipid in the mem- 
brane of the cell that it will eventually intoxicate and 
binds directly to the sugars HD The pentamer then 
adsorbs to the surface of the bilayer of phospholipid 
and, presumably, the ring of whelices inserts into the 
membrane and the other subunit is threaded through 
the hydrophilic pore at its center. In the crystallographic 
molecular model of the intact, water-soluble complex of 
the components of the toxin, the carboxy terminus of 
this other subunit is already threaded into the pore, 
ready to enter the ell II 

In diphtheria toxin, another toxin that threads one 
of its domains into a cell through a pore formed by 
another of its domains, the domain that inserts a portion 
of its structure into the membrane to form the pore™ has 
a structure reminiscent of the pore-forming domain in 
colicin El. It is a bundle of nine ao helices, and the central 
core of this bundle is two «helices that are unusually 
hydrophobic,” which are thought to form the pore 
itself. 

Lipoproteins are complexes between specific pro- 
teins and heterogeneous mixtures of phospholipid, cho- 
lesterol, triacylglycerol, and fatty acyl esters of 
cholesterol that store the lipids they contain in reposi- 
tories such as yolk or that transport the lipids they con- 
tain through extracellular fluids such as the serum of 
blood. The lipoprotein in the yolk of eggs is a complex 
between the protein vitellogenin and the lipids. The 
lipoproteins in serum are chylomicrons, very low den- 
sity lipoprotein, low density lipoprotein, and high den- 
sity lipoprotein. 

In the mature lipoprotein in yolk, each molecule of 
the complex contains the posttranslationally modified 
version of one molecule of vitellogenin. In the yolks of 


eggs from G. gallus, the vitellogenin (1897 aa) is post- 
translationally cleaved into four fragments: lipovitellin I 
(1124 aa), phosvitin (252 aa), lipovitellin II (235 aa), and 
yolk glycoprotein 42 (284 aa).®®® Each of these frag- 
ments has its own name because they were identified 
before intact vitellogenin was identified when they 
seemed to be separate proteins. The most peculiar and 
independent of these fragments is phosvitin, which con- 
tains 123 serines (50% of its amino acids), of which 
about 25 are phosphorylated. It is the vitellogenin 
lipoprotein in the yolks of eggs from G. gallus that is the 
source of the natural phosphatidylcholine that is puri- 
fied from them. 

The crystallographic molecular model of the 
mature lipoprotein formed between the lipid and these 
fragments of vitellogenin from Ichthyomyzon unicuspis 
contains a large globular cavity about 2.5 nm in diame- 
ter in which the lipid is located.°'’ This ball of lipid is 
surrounded by three slabs of Bsheet of 15, 9, and 6 
strands in width, respectively, and one «helix.°!"1? 
There are significant openings in this central cavity to 
the surrounding aqueous solution. The cavity contains 
about 30-35 molecules of lipid, about 20-25 of which are 
phospholipid, so the surfaces of the ball of lipid exposed 
to the solution are presumably paved by the head 
groups of these molecules of phospholipid, as are the 
surfaces of a bilayer of phospholipid. In the maps of 
electron density, several complete molecules of phos- 
pholipid could be observed as well as many fragments of 
linear alkane tucked into crevices formed by the pleats 
of the £ sheets,°’ just as the linear alkane or the phos- 
pholipids in the crystallographic molecular models of 
integral membrane-bound proteins are tucked into 
crevices between the œ helices. These complete mole- 
cules of phospholipid and fragments of linear alkane are 
not distributed as they would be in a spherical micelle, 
so the lipid must be irregularly packed into the cavity. 
The remaining approximately 10 molecules of lipid 
other than the phospholipid are triacylglycerols, choles- 
terols, and fatty acyl esters of cholesterol. None of the 
steroid has yet appeared in the maps of electron density 
and may be disordered within the interior of the ball of 
lipid. 

The two major classes of lipoproteins in mam- 
malian serum are low density lipoprotein (LDL) and high 
density lipoprotein (HDL). Very low density lipoprotein 
is a precursor from which lipid is stripped to produce low 
density lipoprotein. Chylomicrons seem to be hybrids of 
low and high density lipoprotein containing large 
amounts (84% by weight) of triacylglycerol. 

Molecules of low density lipoprotein are spheres of 
lipid and protein that are remarkably uniform in size; 
their diameters are about 22 nm.® Unlike a molecule of 
the vitellogenin lipoprotein in yolk, which is a large pro- 
tein in which a smaller ball of lipid is confined, low den- 
sity lipoprotein is a large sphere of lipid (80% by weight) 
controlled by a much smaller amount of protein (20% by 


weight). The sphere of lipid is composed on the average 
of about 2000 fatty acyl esters of cholesterol, 200 mole- 
cules of triacylglycerol, 1000 molecules of phospholipid, 
and 1000 molecules of cholesterol. The molar ratio of 
unesterified cholesterol to phospholipid is similar to that 
in the bilayer of a biological membrane. There is enough 
phospholipid and cholesterol to cover 70% of the sphere 
with a monolayer of the same dimensions as the mono- 
layer of a biological membrane.‘'* The remainder of the 
surface is covered by the protein. 

Each molecule of low density lipoprotein contains 
one molecule of apolipoprotein B100.°° Human 
apolipoprotein B100 is 4536 aa long. In addition to pro- 
viding the remainder of the surface of a low density 
lipoprotein, apolipoprotein B100 sets the size of the 
sphere. That it does so was demonstrated by producing 
in cells in culture, by limited treatment with puromycin, 
a set of 13 different variants of low density lipoprotein, 
each formed from a fragment of apolipoprotein B100 ofa 
different length from 1100 to 3600 aa. The lengths of the 
equators of these spherical variants of low density 
lipoprotein were directly proportional to the length of 
the fragment of apolipoprotein B100 they contained.°'* It 
was concluded that apolipoprotein B100 is a belt around 
the waist of a native low density lipoprotein; the length 
of the belt determines the size of the waist of the sphere 
of lipid.” Within the core of a molecule of low density 
lipoprotein, the lipid constituted by the fatty acyl ester of 
cholesterol and triacylglycerol may not be an isotropic 
fluid. Lamellar structures have been observed in this 
region in image reconstructions of molecules of low den- 
sity lipoprotein embedded in amorphous ice.°'® These 
lamellae may or may not be present at physiological tem- 
peratures. 

High density lipoprotein has similar molar ratios 
of fatty acyl esters of cholesterol, triacylglycerol, choles- 
terol, and phospholipid to those of low density lipopro- 
tein and is also about 20% protein. Unlike the 
vitellogenin lipoprotein in yolk and low density lipopro- 
tein, however, high density lipoproteins, even within the 
same individual, are a mixture of molecules of different 
size ranging in diameter from 4 to 10 nm. This is proba- 
bly due to the fact that there are three different 
apolipoproteins, Al, A4, and A5, that are incorporated 
into the various high density lipoproteins. Each of these 
proteins is formed from internal multiples of a repeat- 
ing segment 22 aa long. Each of these segments from 
any one of the three proteins can be aligned with any 
segment from any one of the three proteins with an 
average of 22% identity and no gaps. Human 
apolipoprotein Al has 9 of these repeats; human 
apolipoprotein A4 has 13; and human apolipoprotein A5 
has 13. A version of apolipoprotein Al missing the first 
43 aa of the 243 aa protein was expressed in E. coli and 
crystallized. The crystallographic molecular model of 
this variant of apolipoprotein Al is a large ring formed 
from a consecutive series of a-helical segments, most of 
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which are 22 aa Jong Hl" The native molecules of high 
density lipoprotein may be enclosed by such rings of 
a-helical segments. The circular dichroic spectra of high 
density lipoprotein do suggest that the protein they con- 
tain is mostly «-helical.°'® Probes of lipid structure sug- 
gest that the phospholipid and cholesterol of high 
density lipoprotein are in a monolayer,‘'? but the small- 
est high density lipoproteins are only about as wide as 
two molecules of phospholipid in a tail to tail orienta- 
tion. This fact makes it hard to imagine how the lipid 
could be organized. 
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Problem 14-1: Pick out potential candidates for mem- 
brane-spanning o helices from the following amino acid 
sequence of an integral membrane-bound protein: 


MNWTGLYTLLSGVNRHSTAIGRVWLSVIFIFRIMVLVVAAES 
VWGDEKSSFICNTLOPGSNSVCYDOFFPISHVRLWSKOLILV 
STPALLVAMHVAHOOHTEKKMLRLEGHGDPLHLEEVKRHKVH 
ISGTLWWTYVISVVFRLLFEAVFMYVFYLLYPGYAMVRLVKC 
DVYPCPNTVDCFVSRPTEKTVFTVFMLAASGICIILNVAEVV 
YLIIRACARRAORRSNPPSRKGSGFGHRLSPEYKONEINKLL 
SEODGSLKDILRRSPGTGAGLAEKSDRCSAC 


Problem 14-2: Membranes containing only cytochrome-c 
oxidase can be isolated from mitochondria. By succes- 
sive extractions with acetone, it is possible to deplete 
these membranes of their phospholipids and obtain 
preparations with different ratios of protein to phospho- 
lipid. The protein in these preparations retains its native 
conformation at all times. The following spin label was 
incorporated into the membranes containing different 
amounts of phospholipid: 


H3C CH3 


2-ethyl-2-(14-carboxytetradecyl)- 
4,4-dimethyloxazolidene N-oxyl radical 


The ratio of protein to spin label was held constant. The 
ESR spectrum ofeach preparation was taken with the fol- 
lowing result: 


806 Membranes 


Normalized to 
concentration 


Normalized to 


mg phospholipid height of central peak 


ee Es A! = 
” nn Zu 
= All 
e Mr ll 
T n NI 


Electron spin resonance spectra of the spin label in 
buffered aqueous dispersions of membrane-bound 
cytochrome-c oxidase with various lipid contents. Ratio 
of spin label to protein remained constant. The lipid to 
protein ratio expressed as milligrams of lipid (milligram 
of protein)” is indicated at the far left. Left, spectra nor- 
malized to the center-line height; right, the same spectra 
normalized to give equivalent values after two integra- 
tions. Therefore, the right column represents constant 
concentration of spin label. Reprinted with permission 
from ref 396. Copyright 1973 National Academy of 
Sciences. 

The left-hand column gives an idea of spectral 
shape; the right-hand column, amplitude of the signal. 
The top spectrum is that of an immobilized probe; the 
bottom spectrum, of a mobile one. Explain these obser- 
vations. 


Problem 14-3: Label each of the following reagents as a 
hydrophobic reagent for modifying membrane-span- 
ning segments of a protein or as an impermeant 
reagent. Indicate the reactive position in each reagent 
with an arrow, and circle the portion of the molecule 
that renders it hydrophobic or impermeant, respec- 
tively. 
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Problem 14-4: For what purpose were the following 
reagents synthesized and used to study membrane- 
bound proteins? 
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Phenyl [*°S]isothiocyanate modified the lysine in the seg- 
ment -PNTALLSLVLMAGTFFFAMMLRKF- in an intact, 
native membrane-bound protein. Where is this lysine 
probably located relative to the bilayer of phospholipid? 


Problem 14-5: 


(A) The density of protein is 1.35 g cm”. If one 
polypeptide of anion carrier were coiled so as to 
form a hard sphere, what would be its diameter? 
Compare this diameter to the width of a bilayer of 
phospholipid. 


(B) N-Formyl-[”S]sulfinylmethionyl methylphos- 
phate (14-23) reacts indiscriminately with lysines 
on the surfaces of protein molecules exposed to 
the solution in preparations of sealed mem- 
branes to form a derivative of the e amino group 
that is radioactive. This reagent cannot pass 
through a membrane because of its polar charac- 
ter. Write the mechanism of this modification of 
lysine. 


Intact erythrocytes were mixed with this reagent, and 
the reaction was allowed to proceed for 10 min. The 
cells were then washed three times with buffer. Band 3 
anion transport protein was purified from these cells 
and was found to be radioactive. The polypeptide from 
this radioactive protein was cleaved with the endopepti- 
dase thermolysin, and the digest was spread on a two- 
dimensional chromatogram. The chromatogram was 
placed over a sheet of photographic film and set aside 
for several days. The film was developed and radioac- 
tive peptides were located visually. The following is a 
diagrammatic representation of the spots observed on 


this film. 
7 Electrophoresis 
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(C) Why was thermolysin used rather than trypsin? 


This experiment was repeated with erythrocytes that had 
been broken open instead of intact erythrocytes. A 
representation of the spots observed on this film is 
shown in the following diagram. 


7 Electrophoresis 
x 0 


Ô 
You should convince yourself that each spot on the pep- 
tide maps corresponds to a unique lysine on a surface of 
the anion carrier. You should also understand that each 


erythrocyte contains 3 x 10° copies of band 3 anion trans- 
port protein in its membrane. 


AydesBoyewoiuy 
Ges 
>) 


(D) What two fundamental chemical properties of 
integral membrane-bound proteins are demon- 
strated by this experiment? How? 


(E) How does the surface area of the anion carrier 
exposed to the exterior ofthe cell compare to the 
surface area exposed to the cytoplasm? 
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Problem 14-6: Rhodopsin is a protein that is firmly 
embedded in the membranes of a vertebrate rod. If the 
sacs ofmembrane known as disks are purified from these 
rods, the only protein they contain in significant quantity 
is rhodopsin, an integral membrane-bound protein. 
Purified disks were dissolved in a solution of nonionic 
detergent and mixed with excess phospholipid in the 
same detergent. When the detergent was slowly 
removed, small unilamellar vesicles, 40-70 nm in diame- 
ter, form spontaneously. The rhodopsin molecules 
ended up embedded in the membranes of the vesicle. 
Spectral measurements demonstrated that the tertiary 
structure ofthe rhodopsin in the vesicles is the same as it 
was in the disk. 


(A) Papain, an endopeptidase, cleaves native 
rhodopsin at only one position in its entire 
sequence to yield two fragments from the original 
molecule of protein. In disk membranes, papain 
cleaves every rhodopsin molecule. In the recon- 
stituted vesicles, it can cleave only 65% of the 
rhodopsin molecules. Draw a diagram of a recon- 
stituted vesicle with bilayers of phospholipid and 
rhodopsin molecules and explain why 35% of the 
rhodopsin is resistant to cleavage. 


(B) Reconstituted vesicles were labeled with 
['Tjiodide ion and lactoperoxidase, an enzyme 
that cannot pass through a bilayer of phospho- 
lipid. Those rhodopsin molecules that cannot be 
cleaved by papain are nevertheless labeled by lac- 
toperoxidase. What does this experiment demon- 
strate about the rhodopsin molecule? How? 


The Fluid Mosaic!’ 


Every membrane in a living cell, regardless of its total 
surface area and shape, is an individual, intact, isolated 
solution. In each case, the solvent is a fluid bilayer of 
phospholipids and other amphipathic lipids of the 
appropriate surface area and shape, and the solutes are 
anchored membrane-bound proteins and integral mem- 
brane-bound proteins. The bilayer is a film of liquid 
paraffin and fused rings of hydrocarbon 3.6 nm in width 
sandwiched between thin hydrophilic lamellae 
0.5-0.7 nm wide. The bilayer of phospholipid is isotropic 
in the two dimensions of the surface defined by its width. 
Because every membrane in a cell is a closed sac, this sur- 
face is finite, continuous, and unbounded. Each protein 
floats upon the sheet of the bilayer at an unvarying 
draught; its membrane-spanning o helices cannot move 
up or down in the bilayer owing to the hydrophobic 
effect. These proteins, however, unless pinned to struc- 
tures outside the membrane, are free to diffuse in the two 
unbounded dimensions of the bilayer. 

That the solvent from which biological membranes 
are composed is a bilayer of amphipathic lipids has 
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been demonstrated in many ways. The diffraction of X- 
radiation by biological membranes is mainly from the 
bilayer they contain. A myelinated nerve is a bundle of 
parallel axons each coated with a tubular spiral of 
myelin, which is the plasma membrane of a Schwann cell 
wrapped around and around the axon. Therefore, the 
plasma membranes of all the Schwann cells are cylindri- 
cally oriented. A myelinated nerve diffracts X-radiation. 
From the diffraction pattern, a radial distribution of elec- 
tron density normal to the axis of an axon can be com- 
puted,®” and it is indistinguishable from that of 
multibilayers formed from an equimolar mixture of 
phosphatidylcholine and cholesterol (Figure 14-4D). 
Vesicles of biological membranes such as plasma mem- 
brane from erythrocytes, plasma membrane from the 
bacterium M. laidlawii, or endoplasmic reticulum from 
skeletal muscle diffract X-radiation in a circular pattern 
that can be thought of as the diffraction pattern of ori- 
ented bilayers of phospholipid (Figure 14-4A,C) spun 
around its center so that its equatorial and meridional 
reflections form circles. These circular diffraction pat- 
terns from biological membranes have the same perio- 
dicity as the diffraction patterns from vesicles containing 
only the lipids from the respective membranes,’ and 
the amplitudes and periodicities of these diffraction pat- 
terns can be explained if they are assumed to arise from 
shells with distributions of electron density identical to 
those of bilayers of cholesterol and phosphatidylcholine 
(Figure 14-4C).% 

When spin-labeled probes are incorporated into 
biological membranes, they behave almost as if they 
were incorporated into bilayers of pure phospholipid. In 
oriented biological membranes such as flattened 
endoplasmic reticulum, nerves, or oriented erythro- 
cytes,° fatty acids containing dimethyl nitroxyl radical 
14-10 at various locations along the hydrocarbon incor- 
porate with their long axes perpendicular to the surface 
of the membrane as they do in bilayers of phospholipid, 
and they display anisotropic motion resembling that dis- 
played in oriented bilayers of pure amphipathic lipids. 
When 2,2,6,6-tetramethylpiperidine N-oxide 


Leon, 
H3C 


is incorporated into vesicles of endoplasmic reticulum 
from skeletal muscle, its spectrum is the same as its spec- 
trum in vesicles of pure amphipathic lipid, and the 
absolute amplitude of its absorbance can be used to 
show that at least 85% of the amphipathic lipid in the 
endoplasmic reticulum is present as an unperturbed 
bilayer of phospholipid. 

Various microorganisms, such as fatty acid aux- 


otrophs of E. coli, can be forced to incorporate high per- 
centages of specific fatty acids into their plasma mem- 
branes. In preparations of these native membranes with 
a more homogeneous lipid composition, phase transi- 
tions between solid bilayers of phospholipid, with their 
hydrocarbons packed in hexagonal array, and liquid 
bilayers of phospholipid, with their hydrocarbons in the 
disordered fluid state, can be detected by X-ray diffrac- 
tion. The phase transitions observed with such biolog- 
ical membranes are very similar to those observed when 
pure bilayers of the lipids extracted from these mem- 
branes undergo the same solid to liquid transformation. 

These transitions can also be monitored by fluores- 
cent probes. Fluorescent molecules such as N-phenyl- 
1-naphthylamine™ are hydrophobic enough to partition 
preferentially into the hydrocarbon of the bilayer of 
phospholipid and register the transition between solid 
and liquid by changes in the intensity of their emission of 
fluorescence. In vesicles of the amphipathic lipids puri- 
fied from bacteria of E coli the membranes of which 
were enriched in various fatty acids, in the intact native 
membranes purified from these cells, and in the whole 
cells themselves, the fluorescent probes detected the 
same phase transitions.’ Quantitative analysis of these 
results showed that at least 80% of the amphipathic lipid 
in the native membranes was in the form of a bilayer 
indistinguishable in its phase transitions from a bilayer 
of the purified lipids.®° 

It has already been noted that in a biological mem- 
brane each membrane-bound protein has its particular 
vectorial orientation. This orientation is maintained 
through the lifetime of a molecule of that protein by its 
inability to rotate even once 180° around any axis paral- 
lel to the surface of the membrane. For this to occur, the 
hydrophilic surfaces of the protein on the two sides of the 
membrane would have to pass through the 3.6 nm of 
hydrocarbon within the bilayer of phospholipid. This 
would require that the hydrogen bonds ensnaring these 
surfaces in the lattices of the liquid water (Figure 6-38) 
would all have to break simultaneously to permit the pro- 
tein to capsize. Apparently, this cannot be accomplished. 

It is also the case that the phospholipids in a cellu- 
lar membrane are asymmetrically distributed.® These 
asymmetric distributions of the phospholipids have 
been demonstrated by submitting sealed, oriented bio- 
logical membranes to digestion with phospholipases 
under nonlytic conditions” or to modification with 
impermeant reagents,’ by extracting phospholipids 
from only one monolayer of a membrane with proteins 
that bind them preferentially,‘° and by exchanging the 
phospholipids accessible on the outer monolayer of 
sealed membranes with radioactive phospholipids in 
other vesicles®” in a reaction catalyzed by phospholipid 
transfer proteins.®” A phospholipid transfer protein car- 
ries a specific phospholipid tightly bound to itself, 
which it is able to exchange for another phospholipid of 
the same type at the external monolayer of a sealed 


membrane. Spontaneous exchange of phospholipid 
between the outer monolayer of a membrane and vesi- 
cles of phospholipid in the absence of phospholipid 
transfer protein is also rapid enough to monitor asym- 
metry when the phospholipid exchanging is the dimyris- 
toyl version Di? It is also possible to incorporate 
phospholipids labeled with 7-nitro-2,1,3-benzoxadiazol- 
4-yl groups on their fatty acyl substituents, allow the cells 
to equilibrate these labeled phospholipids across their 
plasma membranes, and then reduce the nitro groups to 
amino groups only on the outer monolayer of the result- 
ing plasma membranes with impermeant dithion- 
He DDT The reduction quenches the fluorescence only 
of those labeled phospholipids in the outer monolayer, 
and the resulting decrease in the fluorescence of the 
7-nitro-2,1,3-benzoxadiazol-4-yl groups quantifies the 
asymmetry. 

By one or the other of these procedures, the distri- 
butions of the various types of phospholipid across the 
bilayers of various biological membranes have been 
determined. In each case, the total moles of phospho- 
lipid in one monolayer always equals, within experimen- 
tal error, the total moles in the other monolayer of the 
membrane, but the distribution of each type between the 
two monolayers is biased (Table 14-8). Phosphatidyl- 
ethanolamine and phosphatidylserine are concen- 
trated in the cytoplasmic monolayers of plasma 
membranes. Sphingomyelin is enriched in the extracyto- 
plasmic monolayer of plasma membranes. Phosphati- 
dylcholine, in animals, or phosphatidylglycerol, in 
bacteria, seems simply to make up the differences 
between the two monolayers.* 

It is unclear whether or not cholesterol is asymmet- 
rically distributed across biological membranes. Two 
independent measurements of cholesterol distribution 
in human erythroctyes found an equal ratio™' or a ratio 
of about 2-fold in favor of the extracytoplasmic mono- 
layer.“ The membrane of influenza virus, derived 
directly from the plasma membrane of its host, has cho- 
lesterol evenly distributed between its two monolayers.” 
Cholesterol, an aliphatic alcohol, should be able to pass 
readily through the bilayer. 

The asymmetries in the distribution of the phos- 
pholipids are maintained by the enzyme phospholipid- 
translocating ATPase.“** This enzyme catalyzes the 
transport of phosphatidylethanolamine and phos- 
phatidylserine from the extracytoplasmic monolayer ofa 
membrane to the cytoplasmic monolayer™®”®*® and cou- 
ples the transport to the hydrolysis of ATP.™’ It is a 


* There is an interesting inversion in the case of the membranes of 
unwrapped mitochondria in which phosphatidylethanolamine is 
concentrated in the extracytoplasmic, intramitochondrial mono- 
layer and phosphatidylcholine is concentrated in the cytoplasmic, 
extramitochondrial monolayer, perhaps as a vestige of the ancestry 
of the mitochondrion, which is thought to have arisen from a 
prokaryotic symbiote. 
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member of the same family of cation-transporting 
ATPases as Ca”'-transporting ATPase (Figure 14—15).° 
The active transport of the amino phospholipids to the 
cytoplasmic monolayer catalyzed by the enzyme drives 
phosphatidylcholine and sphingomyelin to the outer 
monolayer passively PPI? 

In sonicated vesicles of purified phospholipids, the 
rate at which a phospholipid, labeled in its head group 
with a tetramethyl cyclic nitroxyl radical, can pass from 
the external monolayer to the internal monolayer is slow. 
The time required for the distribution to come halfway to 
equilibrium at 30 °C was measured to be 6h, but there 
was evidence that oxidation of the phospholipids was 
adventitiously accelerating the rate.” When an 
exchange protein specific for phosphatidylcholine was 
used to exchange the phosphatidylcholine in small uni- 
lamellar vesicles of pure phosphatidylcholine, a very 
slowly exchanging component (t, 2 10 days at 37 °C) was 
observed that accounted for about 40% of the total phos- 
phatidylcholine. This slow component was assigned to 
phosphatidylcholine on the inner monolayer that had 
to transfer to the outer monolayer before it could 
exchange.‘ In large unilamellar vesicles, however, the 
rate at which phosphatidylcholine can transfer between 
monolayers is more rapid P" Equilibration occurs in less 
than 3h at 37°C in large unilamellar vesicles of pure 
dimyristoylphosphatidylcholine. 

In biological membranes the transfer of phospho- 
lipids between the two monolayers of the bilayer seems 
to occur at about the same rate as that in large unilamel- 
lar vesicles of pure phospholipids or somewhat faster. 
[°P] Phosphatidylcholine was observed to transfer from 
the extracytoplasmic monolayer to the cytoplasmic 
monolayer in an erythrocyte with a half-time for equili- 
bration of 1-2h at 37°C.® The same phospholipid 
labeled in its hydrophilic functional group with the same 
tetramethyl cyclic nitroxyl radical that equilibrated 
slowly in small vesicles of pure phospholipid could equil- 
ibrate in vesicles of membrane from the electric organ of 
Electrophorus electricus with a half-time of less than 
10 min at 15 °C.” In Bacillus megaterium, newly synthe- 
sized phosphatidylethanolamine in the cytoplasmic 
monolayer reaches equilibrium between the two mono- 
layers within 30 min at 24 °C.®" It is these passive equili- 
brations that must be constantly compensated for by the 
active transport of the aminophospholipids into the 
cytoplasmic monolayer of the plasma membrane cat- 
alyzed by phospholipid-translocating ATPase. 

There is another enzyme, phospholipid scram- 
blase,°°”©° that catalyzes the passive transport of phos- 
pholipids between the two monolayers of a plasma 
membrane. The activity of this enzyme is controlled by 
levels of Ca”, so that the rapid equilibration of the phos- 
pholipids in the two monolayers catalyzed by this 
enzyme occurs only in particular circumstances.*”° This 
control avoids the futile waste of MgATP that would 
occur if the enzyme were active continuously. 


Table 14-8: Asymmetric Distribution of Phospholipid and Sphingomyelin between the Two Sides of a Biological Membrane 


phospholipid’ (% of total) 


extracytoplasmic monolayer cytoplasmic monolayer 
PC’ PE PS SM PG DPG PI PC PE PS SM PG DPG PI 

plasma membranes 

human erythrocyte®?*° 20 5 <1 20 0.4 10 25 10 5 1.6 

erythrocyte of R. norvegicus**® 30 5 1° 10 15 20 15° <1 

human erythrocyte®! ND? 5 <l ND ND 25 15 ND 

Bacillus megaterium™! 25 25° 50 <5° 
disks from rod outer ND 10 5 ND ND 30 5 ND 

Segment" 
unwrapped bovine 10 20 15 30 10 5 

mitochondria 


“Numbers are mole percentages in each monolayer based on the total amount of phospholipid in the membrane. ’PC, phosphatidylcholine; PE, phosphatidylethanolamine; SM, sphingomyelin; PG, phosphatidyl- 


glycerol; DPG, diphosphatidylglycerol; PI, phosphatidylinositol. ‘Phosphatidylserine plus phosphatidylinositol. “ND, not determined. ‘By difference. 


018 


sSOUEIqWUOMN 


As is the case with mixtures of pure phospholipid 
and cholesterol, the lipids in the bilayers of biological 
membranes can separate laterally into distinct phases. In 
particular, a separate phase that is enriched in choles- 
terol and sphingomyelin exists in plasma membranes of 
animal cells. This phase separates as patches of less fluid 
lipid that are surrounded by the rest of the lipid, which is 
more fluid. These patches are rafts.“ Rafts have a higher 
frequency of saturated fatty acyl groups on their lipids, a 
property characteristic of sphingomyelin that may 
explain its preferential inclusion. It is the higher concen- 
tration of saturated fatty acyl groups that causes the 
bilayer of a raft to be less fluid. Separated phases that 
resemble the rafts in natural membranes can be pro- 
duced experimentally by adding cholesterol and phos- 
pholipids or sphingomyelins containing only saturated 
fatty acyl groups to multibilayers®°*°°° or monolayers®”’ 
composed of purified phospholipids with the normal 
composition of unsaturated fatty acids. 

Rafts can be purified from the remainder of a bio- 
logical membrane because they are much less soluble in 
the detergent Triton X-100 (1 4-12).®8 When membranes 
from animal cells are dissolved with Triton X-100 at 4 °C, 
the rafts can be isolated as large aggregates. Rafts isolated 
in this way are enriched in lipid and depleted in protein 
relative to the rest of the membrane in which they are 
found.® In addition to the cholesterol and sphin- 
gomyelin, rafts contain a high concentration of gly- 
cosphingolipids*” such as glucosylceramide, 
galactosylceramide, lactosylceramide, and globoside 
(Table 14-1). Glycosylphosphatidylinositol-anchored 
proteins, ®®®® triply palmitoylated caveolin,” and 
doubly palmitoylated protein-tyrosine kinases°” have 
been found to be preferentially associated with rafts. 
These proteins, like the glycolipids, insert only their fatty 
acyl groups into the membrane so their protein sits upon 
the raft. As a result, the raft itself is mostly assembled 
from lipid. The sizes of the rafts can be ascertained by 
measuring the sizes of the patches of these proteins sit- 
ting on them. These patches are about 300-500 nm in 
diameter.°%*% 

Although rotational diffusion of a molecule of an 
integral membrane-bound protein about any axis paral- 
lel to the surface of the membrane does not occur and 
rotational diffusion of a molecule of phospholipid about 
any axis parallel to the surface of the membrane is slow, 
integral membrane-bound proteins, anchored mem- 
brane-bound proteins, and phospholipids all display one 
degree of rapid rotational diffusion about axes normal to 
the surface of the membrane and two degrees of transla- 
tional diffusion along axes parallel to the surface of the 
membrane. These diffusional degrees of freedom are 
prescribed by the fact that a bilayer of amphipathic lipids 
is a two-dimensional solvent, and solutes dissolved in 
this solvent find themselves in a two-dimensional solu- 
tion. 

The translational diffusion of proteins over the two 
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dimensions of a plasma membrane was strikingly 
demonstrated by an experiment involving the fusion of 
two cells.» Immunoglobulins G specific for antigens on 
the surface of murine (c11D) and human (VA-2) cells, 
respectively, were produced. These immunoglobulins 
were covalently modified with different fluorescent 
reagents, one that fluoresced green and one that fluo- 
resced orange, respectively. The addition of the former 
immunoglobulins to mouse cells turned proteins in their 
plasma membranes green, and the addition of the latter 
immunoglobulins to human cells turned proteins in their 
plasma membranes orange. When a mouse cell was 
fused with a human cell, the hybrid was initially stained 
green at one side and orange at the other, but the two sets 
of antigenic proteins then diffused over the plasma 
membrane of the hybrid until, within 40 min, they were 
each uniformly distributed. This intermixing resulted 
from two-dimensional translational diffusion. 

In a three-dimensional, isotropic solvent, such as 
an aqueous solution, the observed translational diffusion 
coefficient, Dr, is the proportionality constant (Equation 
1-63) between the net flux, J, of a certain solute across a 
unit area, in a plane normal to its direction of net flux, 
and the gradient of its concentration, c, along the direc- 
tion of net flux, x: 


(14-2) 


The units on J are moles centimeter? second”, and on 
the gradient of concentration c they are (moles centime- 
ter) centimeter”. Therefore, the units of Dr are cen- 
timeters? second”. The mean square displacement. a”, 
that the molecules of the solute will experience over a 
given time interval, t, is related to the diffusion coeffi- 
cient by the equation 


d? =4Dyt (14-3) 


In a three-dimensional solution, molecules of the solute 
will also experience rotational motion as well as transla- 
tional motion, and the observed rotational diffusion 
coefficient, Dp, for this rotational motion can be defined, 
in analogy with Equation 14-3, as 


Dr = — (14-4) 


where 6? is the mean square angular displacement expe- 
rienced in time t. If @ is expressed in radians, the units on 
Dx are radians second. 

The theoretical relationship that is used to calculate 
a frictional coefficient for translational motion in three 
dimensions, frs, from the observed translational diffu- 
sion coefficient is 


812 Membranes 


kgT 
frs = De 
T 


(14-5) 


where Ke is Boltzmann’s constant and T is the tempera- 
ture. An analogous theoretical relationship that is used to 
calculate a frictional coefficient for rotational motion in 
three dimensions, frs, from the observed rotational diffu- 
sion coefficient can be written 


fr = Da (14-6) 


If the molecules of the solute were hard spheres of radius 
rin a solvent of viscosity 7 (Equation 1-66), then 


frs = 6anr (14-7) 


and 


fra = 8anr° (14-8) 


Equations 14-7 and 14-8 are theoretical equations relat- 
ing the frictional coefficient to the dimensions of the 
sphere. 

Diffusion of a solute in a two-dimensional solvent, 
isotropic in those two dimensions, can be treated in par- 
allel. The solvent is two-dimensional because the solute 
is confined to rotate only about an axis normal to the 
plane defined by the two dimensions and to translate in 
only the two dimensions. The observed one-dimen- 
sional rotational diffusion coefficient Dz, is defined by 
Equation 14-4 where the angular displacement is only 
around the axis normal to the plane. The observed two- 
dimensional translational diffusion coefficient, Drz, is 
the proportionality constant (Equation 14-2) between 
the net flux of a substance across a unit width on a line 
normal to its direction of net flux, in moles centimeter! 
second”, and the gradient of its concentration along the 
direction of net flux, in (moles centimeter’) centime- 
ter. The units of Dr are also centimeter” second”. The 
mean square displacement of one molecule is still gov- 
erned by Equation 14-3. This relationship has been veri- 
fied experimentally by following the lateral movements 
as a function of time of single, fluorescently labeled 
molecules of phospholipid in a bilayer of dioleoyl phos- 
phatidylcholine.°° The frictional coefficient for one- 
dimensional rotational diffusion, fe, can be calculated 
from the observed rotational diffusion coefficient by a 
theoretical equation analogous to Equation 14-6, and the 
frictional coefficient for translation in a two-dimensional 
solvent, fro, can be calculated from the observed transla- 
tional diffusion coefficient by a theoretical equation 
analogous to Equation 14-5, with the substitution of frı 
and fr for fr; and frz, respectively. 


When diffusion of a solute such as an integral mem- 
brane-bound protein in a two-dimensional solvent such 
as a bilayer of phospholipids is evaluated, it is usually 
assumed that the molecule can be treated as an equiva- 
lent right circular cylinder of radius r. The theoretical 
equations relating the frictional coefficient for its rota- 
tional diffusion, frı, to the dimensions of a right cylinder 
of any radius r in a two-dimensional solvent of width h 
about an axis normal to the surface of the solvent have an 
exact solution, which is 


fri = Ann Ch (14-9) 


For right cylinders of radius r that are the size of integral 
membrane-bound proteins (Figures 14-14 to 14-18) 
undergoing two-dimensional translational diffusion in a 
sheet of liquid paraffin with a width, h, of about 3-4 nm 
and a viscosity, ny, significantly greater than that of the 
water that sandwiches the liquid paraffin on both sides 
with a viscosity, Ny, of 0.9 mPa s, at 298 K 


nyh > 
fro = tan h(n ?Z. - nl (14-10) 


WwW 


where yg is Euler’s constant (0.5772). This theoretical 
relationship relating the frictional coefficient, frz, for its 
translational diffusion to the dimensions of a right cylin- 
der is not exact because the equations for slow viscous 
flow used to derive Equation 14-7 in three dimensions 
have no exact solution in two dimensions. 

Unlike Equations 14-7 and 14-8, Equations 14-10 
and 14-9 cannot be used quantitatively to determine the 
equivalent of a Stokes’ radius a (Equation 1-67) or the 
shape of a molecule of protein (Figure 12-1) or lipid in a 
bilayer of phospholipid. It is not possible to determine 
independently the value of ny experienced by the diffus- 
ing solute because a bilayer of natural amphipathic lipids 
is not an isotropic sheet of liquid hydrocarbon (Figure 
14-2). It is also not known what value should be used for 
h, the width of the bilayer. These equations, however, 
can be used qualitatively to show that the diffusion coef- 
ficients measured are in reasonable agreement with the 
sizes of the diffusing solutes, the width of a bilayer, and 
the expected viscosity of the liquid hydrocarbon. 

The rotational diffusion constant for a naturally flu- 
orescent molecule of protein such as bacteriorhodopsin 
or rhodopsin with their tightly bound retinals or a mole- 
cule of protein modified with a fixed fluorescent reagent 
is measured by monitoring the decay in the anisotropy 
of the fluorescence™ or phosphorescence™ of the chro- 
mophore following excitation with a flash of polarized 
light. The rotational diffusion coefficient of bacterio- 
rhodopsin in bilayers of dimyristoylphosphatidylcholine 
is affected little by the concentration of the protein in 
the bilayer, and at 30°C it is equal to 7 x 10’ el The 


rotational diffusion coefficient of rhodopsin in densely 
packed disks of photoreceptors is 5 x 10° s™ at 20 °C.” If 
the radius of an imaginary cylindrical bacteriorhodopsin, 
r, is taken as 2.0 nm and the width of the bilayer, h, as 
4.5 nm, then the viscosity of the bilayer, ny, sensed by the 
rotating protein (Equation 14-9), is 400 mPa s. 

Large multilamellar vesicles of dimyristoylphos- 
phatidylcholine, 25-50 um across, can be prepared so 
that they contain various concentrations of bacterio- 
rhodopsin.°” Because bacteriorhodopsin is fluorescent, 
the distribution of the protein over the membranes of a 
vesicle can be monitored in a microscope by the distri- 
bution of fluorescence. When a circular area 5 um in 
diameter in the middle of a vesicle is submitted to 
intense irradiation by a laser, the retinal within the circle 
is photolytically bleached. After bleaching, the molecules 
of bacteriorhodopsin in the circular area are no longer 
fluorescent, but those surrounding the circle still are. As 
translational diffusion takes place, the circle slowly fills 
with fluorescent molecules entering from the perimeter, 
and the bleached circle gradually disappears. From such 
recovery of fluorescence following photobleach- 
ing, 67167? the translational diffusion coefficient Dr, of the 
unbleached bacteriorhodopsin moving into the circle 
can be calculated. 

Measurements were made of the two-dimensional 
translational diffusion coefficient for bacteriorhodopsin 
at several different temperatures and concentrations of 
the protein in the bilayers.°”’ The values varied between 
0.1x 10° and 4x 10° cm? s. As the mole fraction of bac- 
teriorhodopsin in the bilayer was decreased from 1 mol 
(30 mol of phospholipid) to 1 mol (210 mol of phos- 
pholipid)’, the translational diffusion coefficient 
increased from 0.1 x 10° to 1.6 x 10° cm? s at 25°C. 
Because the diffusion coefficient was still increasing sig- 
nificantly at the lowest concentration at which measure- 
ments could be made, the translational diffusion 
coefficient at zero density, Dy’, could not be determined 
accurately by extrapolation. 

At the lowest concentrations of protein examined at 
30 °C, the translational diffusion coefficient of bacterio- 
rhodopsin, 3.4 x 10° cm? s’', is that of a cylinder of pro- 
tein with a radius r equal to 2.0nm, in a bilayer of 
phospholipid with a width, h, of 4.5 nm, the viscosity of 
whose hydrocarbon, ny, is 110 mPa s. The broadest 
dimension of the bundle of o helices in a single molecule 
of bacteriorhodopsin is about 4 nm (Figure 14-14), the 
width of a bilayer of natural phospholipid and choles- 
terol, including the hydrophilic functional groups, is 
4-5 nm (Figure 14-4D), the viscosity of motor oil at 30 °C 
is between 100 and 200 mPa s, and the viscosity of veg- 
etable oil at 30°C is about 50 mPa s. Because realistic 
numerical values for these three parameters can be used 
to calculate, by Equations 14-9 and 14-10, a diffusion 
coefficient equal to the one that is measured, it appears 
that bacteriorhodopsin, at least when it is in bilayers of 
dimyristoylphosphatidylcholine, is diffusing freely and 
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predictably within the two-dimensional solvent formed 
by those bilayers. Its diffusion is a random walk driven by 
thermal energy through an isotropic, viscous medium 
just as is the diffusion of a soluble protein through an 
isotropic aqueous solution. 

Translational diffusion coefficients have been 
measured for other integral membrane-bound proteins. 
Bacteriorhodopsin and rhodopsin are already fluores- 
cent, but a purified membrane-bound protein can be 
covalently modified with a fluorescent electrophilic 
reagent and then incorporated into bilayers of phospho- 
lipid, and its two-dimensional translational diffusion 
coefficients can be measured®” by monitoring recovery 
of fluorescence following photobleaching. For anion car- 
rier at a surface concentration of 1 mol (200 mol of phos- 
pholipid)', the translational diffusion coefficient is 
1.6 x 10% cm? s™ at 30°C." The translational diffusion 
coefficients of bovine rhodopsin, Ca**-transporting 
ATPase from endoplasmic reticulum of skeletal muscle, 
and acetylcholine receptor have all been determined 
by fluorescence photobleaching recovery at several 
temperatures in reconstituted membranes at high dilu- 
tion [< 1 mol of protein (3000 mol of phospholipid) "LP" 
The values for the translational diffusion coefficients 
at 25 °C are between 1.4 x 10° and 2 x 10° cm? s for 
all three proteins, and they are indistinguishable 
within the ranges of their standard deviations. From 
Equation 14-10, the viscosity calculated for the bilayer 
from these latter measurements is between 100 and 
200 mPa s. 

At high dilution at 25 °C, the translational diffusion 
coefficients of all integral membrane-bound proteins are 
between 1 x 10° and 2 x 10° cm? s™. This means that 
after 1 s the average value of the square of the distance 
that a protein will be situated from the position it occu- 
pied initially will be 10 um’. In a densely packed plasma 
membrane, however, the translational diffusion coeffi- 
cients are significantly less. For example, for rhodopsin 
at 20 °C, densely packed in the disks in photoreceptors, 
the translational diffusion coefficient®” is 0.3 x 10° cm? 
s™. A value of 0.02 x 10° cm? s at 37 °C has been deter- 
mined for randomly labeled proteins in the plasma 
membranes of L-6 cells.° In the latter situation, the 
mean square displacement for one of these proteins 
after 1 s will be only 0.1 um’. The diameter of a normal 
eukaryotic cell is about 20 um. Therefore, it should take 
about 100 min for a protein with this low value for its 
translational diffusion coefficient to spread over the 
plasma membrane ofa cell of this size if the protein were 
added at only one point on its surface. 

It is the dense packing of the proteins in a normal 
biological membrane that causes the diffusion coeffi- 
cients of most proteins to be much less than they are 
when they are moving unhindered over a bilayer of 
phospholipids. The value of 0.02 x 10% cm? s at 37 °C 
measured for the diffusion coefficient of proteins in a 
plasma membrane is in the same range as diffusion coef- 
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ficients of 0.06 x 10°cm? s at 25°C for ubiquinol- 
cytochrome-c reductase in mitochondrial membranes°*”’ 
and 0.006 x 10® cm? s for phycobilisomes in mem- 
branes of thylakoids.’ When the concentration of pro- 
tein in the mitochondrial membranes was decreased 
systematically by incorporating endogenous phospho- 
lipid, the diffusion coefficient of ubiquinol-cyto- 
chrome-c reductase increased monotonically. Upon a 
7-fold dilution, its diffusion coefficient had increased 
almost 20-fold.°”” 

The translational diffusion coefficients of integral 
membrane-bound proteins vary dramatically with con- 
centration but insignificantly with variations in the 
apparent radius, r, of the equivalent cylinder because of 
the logarithmic dependence of the frictional coefficient 
on this latter parameter (Equation 14-10). These are two 
additional reasons why translational diffusion coeffi- 
cients cannot be used to provide any insight into the 
shapes of these proteins in the bilayer of phospholipid. 

The phospholipids in a membrane also display 
translational diffusion. This can be followed by using 
phospholipids, such as phosphatidylethanolamine, that 
have been modified by fluorescent reagents, in the case 
of phosphatidylethanolamine at its primary amine. The 
translational diffusion coefficients for such a fluores- 
cent lipid in bilayers of various phospholipids are 
between 4 x 10° and 9 x 10% cm? s at 25 °0.7%679,680 
These values compare favorably with those calculated 
from spin-exchange among molecules of a nitroxylphos- 
phatidylcholine in vesicles of various natural phospho- 
lipids, which are about 10 x 10% cm? sl at 45 °C.®! In 
bilayers of dimyristoylphosphatidylcholine and choles- 
terol at various ratios, the diffusion coefficients for a 
fluorescent phospholipid are between 1 x 10° and 
3 x 10® cm? el at 25 °C.!” The translational diffusion 
coefficients for lipids are not much greater than those for 
proteins. 

Even though the diffusion coefficients for phospho- 
lipids are not much larger than those for proteins, 
Equation 14-10 does not describe their behavior. It fails 
to do so because a phospholipid does not span the mem- 
brane completely so the viscosity of the fluids at its two 
ends are dramatically different°® and because, unlike a 
membrane-spanning protein, its cross-sectional area is 
the same as those of the phospholipids forming the 
bilayer, 103679683 

Associated with the bilayer of a biological mem- 
brane is a microviscosity. The microviscosity is the vis- 
cosity experienced by a small hydrophobic solute 
dissolved in the liquid hydrocarbon while it is rotating 
isotropically. Because it is measured by following rota- 
tion, the microviscosity is the viscosity of the solvent in 
the immediate vicinity of the small solute. This microvis- 
cosity is generally estimated from the polarization 
retained in the fluorescence of a hydrophobic, fluores- 
cent solute such as 2-methylanthracene™ after its exci- 
tation with a flash of polarized light. A hydrophobic 


solute is used so that most of the molecules of that solute 
in the sample have been incorporated into the hydro- 
carbon of the bilayer of phospholipid. The more rapidly 
the solute is reorienting within the bilayer of phospho- 
lipid during the lifetime of the excited state, the greater 
will be the loss of its polarization. This loss of polariza- 
tion can be calibrated by the behavior of the fluorescent 
solute in hydrocarbon solvents of known macroscopic 
viscosity. 

In bilayers made in the laboratory from purified 
natural phospholipid, microviscosities between 100 and 
200 mPa s have been observed at 25 °C.°®°%®° The micro- 
viscosity determined in vesicles formed from only the 
lipids in a biological membrane is almost the same as 
that determined for the complete biological membranes 
from which the lipids were extracted,°” and addition of 
an integral membrane-bound protein to vesicles of pure 
phospholipid affects the microviscosity only slightly.°®” 
These results suggest that the regions within the bilayer 
of phospholipid occupied by the fluorescent solutes used 
to monitor this property are mainly the bulk lipid 
between the molecules of protein. Because it is the rota- 
tional diffusion of the probe that senses the microviscos- 
ity rather than its translational diffusion, the presence of 
protein, which provides obstacles mainly to translation, 
has only a small effect. Nevertheless, the microviscosities 
determined are in the same range as the viscosities that 
seem to be controlling the translational and rotational 
diffusion of molecules of protein when they are dissolved 
at low concentrations in bilayers of phospholipids. The 
composition of the fatty acyl groups in the phospholipids 
forming biological membranes (Table 14-3) has been 
adjusted through evolution by natural selection to com- 
pensate for the different mean body temperatures of 
cold-blooded animals so that a similar microviscosity is 
maintained regardless of the mean temperature of the 
environment.*” 

The addition of cholesterol increases the observed 
microviscosity of bilayers of phospholipid by a factor of 
5-10,°® an effect that seems to be considerably larger 
than that of cholesterol on the translational diffusion 
coefficient of phospholipids.’ The addition of choles- 
terol to bilayers of phospholipid, however, increases the 
rotational diffusion coefficients of 1,6-diphenyl- 
1,3,5-hexatriene dissolved in them only by factors of SCH 
These results suggest that the viscosity of the bilayer 
becomes more anisotropic upon addition of cholesterol. 

Epidermal growth factor receptor is an integral 
membrane-bound protein that depends on its ability to 
diffuse translationally over a bilayer of phospholipid to 
accomplish its function. Epidermal growth factor is a 
polypeptide hormone that stimulates the growth of cells 
from a variety of tissues. Epidermal growth factor recep- 
tor is the protein in the plasma membrane of a cell to 
which epidermal growth factor binds to exert its effect on 
the cell. Human epidermal growth factor is a small 
(53 aa) soluble protein; human epidermal growth factor 


receptor is a large (1186 aa) monomeric glycoprotein 
that spans the membrane.'” Epidermal growth factor 
receptor is a member of a group of structurally and func- 
tionally related receptors for growth factors on the cell 
surface characterized by an intrinsic activity for protein 
tyrosine kinase.'®’ The initial response in the cascade 
leading to the mitosis caused by the binding of epidermal 
growth factor to epidermal growth factor receptor is the 
activation of this protein-tyrosine kinase. 

The amino acid sequence of human epidermal 
growth factor receptor’ can be divided into two 
domains of about equal size that are located on opposite 
sides of the plasma membrane. The extracytoplasmic 
domain of the protein (620 aa) contains the binding site 
for epidermal growth factor.'** The cytoplasmic domain 
of the protein (540 aa) contains the active site for pro- 
tein-tyrosine kinase.'®? Both of these domains have been 
produced independently, in their entirety, and they are 
well-behaved soluble proteins with the respective func- 
tions.'**’** Both have been crystallized, and crystallo- 
graphic molecular models are available for each of 
them.®®™®% In the intact native protein the short segment 
between these two domains is composed of 23 
hydrophobic amino acids, 15 of which are leucine, 
isoleucine, valine, methionine, or phenylalanine. There 
is no doubt that this segment spans the membrane, pre- 
sumably in one «helix, connecting by this tether the 
extracytoplasmic domain and the cytoplasmic domain in 
the complete native protein. It has been shown that 
mutations, insertions, or deletions in this membrane- 
spanning segment have little effect on activation of the 
protein-tyrosine kinase;** hence, its role in the activa- 
tion of the protein-tyrosine kinase activity must not 
involve any severe structural requirements, short of 
spanning the bilayer of phospholipid and conjoining the 
two domains. 

There have been a number of reports implicating 
the dimerization of epidermal growth factor receptor in 
the activation of its protein-tyrosine kinase. Moderate 
yields of covalent chemical cross-linking between 
monomers are observed but only after epidermal growth 
factor has been bound; bivalent immunoglobulins 
against epidermal growth factor receptor, but not 
Fab fragments from these immunoglobulins, can acti- 
vate its tyrosine kinase in the absence of epidermal 
growth factor;®®®® and mutant forms of epidermal 
growth factor receptor can suppress the activation of 
wild-type epidermal growth factor receptor. 

It has been possible to follow the dimerization of 
monomeric epidermal growth factor receptor as a func- 
tion of time by quantitative cross-linking,” just as the 
tetramerization of phosphoglycerate mutase was fol- 
lowed by quantitative cross-linking (Figure 13-17). After 
epidermal growth factor was added to epidermal growth 
factor receptor dissolved in a solution of the detergent 
Triton X-100 (14-12), the initially monomeric protein 
dimerized in a reaction that could be shown to be kinet- 
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ically second-order in the concentration of protein. 
When the activation of protein-tyrosine kinase activity 
was followed in the same preparations, it was found that 
it was also a process second-order in the concentration 
of epidermal growth factor receptor and that the second- 
order rate constants for dimerization and the activation 
of protein-tyrosine kinase were identical. It follows that 
the rate-limiting step in the activation of the enzyme is 
dimerization of the protein. In the plasma membrane, 
this dimerization must result from collisions between 
monomers of epidermal growth factor receptor as they 
diffuse in two dimensions across the surface of the 
bilayer of phospholipid. 

Because the protein spans the membrane in a 
single o helix, it seems unlikely that the order to dimerize 
is transmitted through this œ helix across membrane, 
and consequently it should be the case that the binding 
of epidermal growth factor to the extracytoplasmic 
domain causes the extracytoplasmic domain to dimer- 
ize. In fact, the crystallographic molecular models of the 
extracytoplasmic domain of human epidermal growth 
factor receptor are symmetrical dimers of the complex 
between the protein and either epidermal growth 
factor” or a related hormone.’ Even though the cyto- 
plasmic domains of two intact molecules of epidermal 
growth factor receptor sterically inhibit the dimerization 
of the extracytoplasmic domains,” during dimeriza- 
tion of the intact protein, dimerization of the cytoplas- 
mic domains through an interface between them is also 
required for activation of the protein-tyrosine kinase. 
This conclusion follows from the facts that bivalent 
immunoglobulins G directed against the carboxy termi- 
nus of the protein activate its protein-tyrosine kinase to 
a level comparable to the activation by epidermal growth 
factor? and that when epidermal growth factor is 
removed from the binding sites on dimerized protein, 
the intact protein dissociates into monomers greater 
than 40-fold more slowly than does a deletion mutant 
lacking the complete cytoplasmic domain.” 

Therefore, dimerization of the extracytoplasmic 
domains produced by the binding of epidermal growth 
factor drags together the cytoplasmic domains, which 
are tethered through the membrane-spanning segment. 
The resulting juxtaposition of the cytoplasmic domains, 
which initially involves a steric problem because they are 
not positioned properly, nevertheless promotes, in turn, 
their dimerization through an interface that forms 
between them after they become properly aligned. The 
dimerization of the cytoplasmic domains leads to the 
activation of the protein-tyrosine kinase. In the crystallo- 
graphic molecular model of the cytoplasmic domain, the 
asymmetric unit is a monomer, but because the space 
group of the crystal is 23, two monomers are related by 
an exact 2-fold rotational axis of symmetry, and the inter- 
face between them may be the same as the one in the 
dimeric cytoplasmic domains in the activated native pro- 
tein. 
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Both observations of the kinetics of activation of 
epidermal growth factor receptor’” and of the binding of 
epidermal growth factor” and the crystallographic 
molecular models of the liganded, dimeric extracytoplas- 
mic domains of epidermal growth factor receptor®*” 
and the related fibroblast growth factor receptor’ 
demonstrate that each monomer of the receptor binds a 
molecule of the respective hormone. The resulting dimer 
is rotationally symmetric, each subunit carrying its own 
molecule of hormone. In fact, the two symmetrically 
arrayed molecules of hormone are on completely oppo- 
site sides of the dimers in the crystallographic molecular 
models. Growth hormone receptor, however, which is 
unrelated to epidermal growth factor receptor but which 
also dimerizes upon binding of its hormone, forms a dif- 
ferent kind of complex in which one molecule of hor- 
mone is bound by two molecules of the receptor.” It 
is the formation of this asymmetric complex that dimer- 
izes growth factor receptor and leads to its activation.’” 
One peculiarity of this process of activation, which is a 
consequence of both the fact that one molecule of hor- 
mone gathers together two molecules of receptor and the 
fact that molecules of receptor are not dimers before the 
hormone is added, is that high concentrations of growth 
hormone inhibit both the dimerization” and the activa- 
tion of growth hormone receptor. These facts require 
that monomers of growth hormone receptor be diffusing 
independently of each other through the plasma mem- 
brane before they are dimerized by binding to the two 
opposite sides of growth hormone. 

There is another set of membrane-bound proteins 
that also relies on translational diffusion over the surface 
of the plasma membrane to fulfill its biological function. 
This is the adenylate cyclase system. The role of this set 
of proteins is also to respond to the presence of an ago- 
nist in the medium surrounding the cell. Binding of the 
agonist to the extracytoplasmic surface of a particular 
protein in the plasma membrane either activates or 
inhibits adenylate cyclase, which is the enzymatic activ- 
ity of an active site at the cytoplasmic surface of a differ- 
ent protein in the same plasma membrane. This active 
site is responsible for the reaction 

MgATP = MgP,0,” + cyclic AMP (14-11) 
If the agonist increases the rate of production of cyclic 
AMP catalyzed by this enzyme, it is stimulatory; if it 
decreases the production, it is inhibitory. 

The agonist initiates the process by binding to one 
of a set of receptors, each of which is a membrane-bound 
protein. A typical example of one of these receptors is 
fB-adrenergic receptor. This protein has the site to which 
a -adrenergic agonist such as epinephrine or norepi- 
nephrine binds as the first step in the stimulation of 
adenylate cyclase. -Adrenergic receptor has been puri- 
fied to homogeneity from microsomes of plasma mem- 
brane from hamster lung that have been dissolved in the 


nonionic detergent digitonin.’” The ability of the recep- 
tor to bind agonist was used as an assay, and the purifi- 
cation involved affinity adsorption to a solid phase to 
which a f-adrenergic antagonist had been attached 
(Table 1-3). The purified receptor is a membrane-span- 
ning glycoprotein. The polypeptide composing the pro- 
tein from the hamster is 418 aa in length and contains 
seven hydrophobic segments, each greater than 20 aa in 
length, that are candidates to be a helices spanning the 
membrane.” 

B-Adrenergic receptor is a member of a large family 
of integral membrane-bound proteins responsible for 
responding to signals such as hormones, odorants, neu- 
rotransmitters, and light. In the human genome there are 
at least 950 genes encoding members of this family.” 
Rhodopsin (Table 14-6) belongs to this family, and bac- 
teriorhodopsin (Figure 14-14) is a homologous bacterial 
protein. Beginning 6 aa before its first membrane-span- 
ning ohelix and finishing 16 aa beyond the last, the 
amino acid sequence of human rhodopsin can be aligned 
with high statistical significance with that of human 
Bl-adrenergic receptor (21% identity; 2.6 gap percentage), 
so the structure of ßl-adrenergic receptor must be super- 
posable upon that of rhodopsin’!° and hence upon that 
of bacteriorhodopsin (Figure 14-14). In particular, the 
seven hydrophobic segments in ßl-adrenergic receptor 
must be o helices that span the membrane. The segment 
(59 aa) of amino acid sequence on the extracytoplasmic 
side of the membrane amino-terminal to the first mem- 
brane-spanning a helix and the two segments (80 and 
97 aa) on the cytoplasmic side of the membrane between 
the fifth and the sixth membrane-spanning o helices and 
carboxy-terminal to the seventh, respectively, are much 
longer in human f-adrenergic receptor than they are in 
bacteriorhodopsin and constitute extracytoplasmic and 
cytoplasmic domains involved in the binding of the hor- 
mone and the transmission of the information. 

The adenylate cyclase itself is a much larger pro- 
tein. The isoform of the human enzyme that responds to 
binding of an agonist to a B-adrenergic receptor is 
1353 aa in length. As with all of the isoforms of the 
enzyme, the amino acid sequence of the adenylate 
cyclase contains 12 hydrophobic segments thought to 
span the plasma membrane as o helices. The pattern in 
which these o helices occur as well as alignments of seg- 
ments of amino acid sequence suggest that the protein is 
a product of an internal duplication.’' In particular, two 
segments of amino acid sequence that contain no mem- 
brane-spanning o helices, each about 220 aa long, occur 
respectively in the middle of the protein, which is the car- 
boxy terminus of the first half of the polypeptide, and at 
the carboxy terminus of the complete protein, which is 
the carboxy terminus of the second half of the polypep- 
tide. The sequences of these two segments can be 
aligned,’ and their crystallographic molecular models 
are superposable.’” Each is preceded by six hydrophobic 
segments in close succession. Together, the two homol- 


ogous segments that have no membrane-spanning 
a helices form a large cytoplasmic domain that is respon- 
sible for the catalysis of adenylate cyclase (Equation 
14-11)" and for which crystallographic molecular 
models are available.” In the various crystallographic 
molecular models, the two homologous cytoplasmic 
domains have superposable structures and are related to 
each other by a 2-fold rotational axis of pseudosymme- 
try. 

The final component in the overall adenylate 
cyclase system is a guanosine nucleotide-binding pro- 
tein or G-protein. There are several types of G-proteins 
present in the same membrane, one type mediating the 
stimulation of adenylate cyclase, another type mediating 
its inhibition, and other types with other roles in other 
systems. 

The stimulation or inhibition of adenylate cyclase 
by an agonist requires the constant presence of GTP 
under physiological conditions’! owing to the require- 
ment that the stimulatory G-protein have GTP bound to 
it before the active site on adenylate cyclase can be stim- 
ulated to produce cyclic AMP. The stimulatory G-protein 
involved in this process binds GTP tightly’ and under 
the appropriate circumstances catalyzes a slow hydroly- 
sis of the GTP to GDP and inorganic phosphate.” Both 
of these properties are reminiscent of the binding and 
hydrolysis of GIP performed by tubulin. The rate of 
hydrolysis of GTP at the active site of a stimulatory 
G-protein is enhanced by the binding of agonist to 
B-adrenergic receptor,”’””"® and this is expressed as a 
guanosine triphosphatase that is activated by the agonist. 

Just as the slow hydrolysis of GIP in a micro- 
tubule is a timing device to give the growing end 
enough time to find its goal before it is eliminated for 
its failure to do so, the slow hydrolysis of GTP bound 
to the stimulatory G-protein is a timing device to termi- 
nate the activity of adenylate cyclase when the agonist 
is no longer present at the extracytoplasmic surface of 
the cell.”°7% When an agonist is abruptly removed 
from the binding site on f-adrenergic receptor by 
adding an antagonist to the solution in which the 
membranes are suspended, the adenylate cyclase activ- 
ity decays slowly.” The rate at which the adenylate 
cyclase activity decays is equal to the rate at which GTP 
is hydrolyzed within the active site of the G-protein that 
is coupled to f-adrenergic receptor and adenylate 
cyclase.’ If this hydrolysis of GTP is blocked by 
cholera toxin, a specific inhibitor of this process, the 
adenylate cyclase no longer decays with time. The 
decrease in the rate of the GTPase activity as a function 
of the concentration of cholera toxin parallels the 
decrease in the fraction of the adenylate cyclase that is 
turned off.” The hydrolysis of the guanine nucleotide 
can also be prevented by the use of a chemical ana- 
logue of GTP such as guanosine 5’-O-(3-thiotriphos- 
phate) that cannot be hydrolyzed by the G-protein. One 
of these analogues, as does cholera toxin, produces 
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adenylate cyclase activity that does not turn off when 
agonist is driven from the receptor "1 

After the GTP is hydrolyzed, GDP remains tightly 
bound to the G-protein, preventing the binding of GTP if 
the B-adrenergic receptor is unoccupied. The binding of 
agonist to the receptor, however, accelerates the dissoci- 
ation of this bound GDP.” 

Therefore, the overall sequence of steps is the fol- 
lowing. Guanosine triphosphate binds to the stimulatory 
G-protein activated by the binding of agonist to B-adren- 
ergic receptor, and this complex between GTP and 
G-protein stimulates the production of cyclic AMP at the 
active site on adenylate cyclase as long as the GIP 
remains unhydrolyzed. Following hydrolysis, the tightly 
bound GDP prevents the G-protein from activating 
adenylate cyclase. When that GDP is released in a disso- 
ciation stimulated by ß-adrenergic receptor to which 
agonist is bound, the empty site can again bind GTP. 

Purified G-proteins are each constructed from 
three different subunits, o, H. and y. The a polypeptides 
from all of the various G-proteins are highly homologous 
in sequence (averaging about 60% identity in pairwise 
comparisons), but they differ significantly in length 
(Nga = 310-400)" because of three regions in which long 
insertions can occur. The œ polypeptide of the human 
stimulatory G-protein that is involved in the B-adrener- 
gic system is 394 aa long;’” the B polypeptide, 340 aa; 
and the y polypeptide, 70 aa. There is a crystallographic 
molecular model of an intact oy heterotrimeric 
G-protein.’ The o subunit has the tertiary structure of 
the proteins in an even larger family that bind guanosine 
nucleotides and control various cellular functions.” The 
p subunit is a ß propeller of seven blades (Figure 6-13). 

The G-proteins involved in coupling receptors such 
as ß-adrenergic receptor to their ultimate biological 
response are bound to membranes because they are 
posttranslationally modified with lipid.”° The o sub- 
units are palmitoylated at a cysteine near their amino 
terminus (Cysteine 3 in the G-protein responsible for 
coupling B-adrenergic receptor to adenylate cyclase), 
and some are also myristoylated at their amino terminal. 
The ysubunits are S-geranylgeranylated (Figure 3-16) at 
a cysteine four amino acids from their carboxy termini. 
None of the subunits, however, has a membrane-span- 
ning segment.” 

Molecules of adenylate cyclase and molecules of 
B-adrenergic receptor diffuse about independently over 
the bilayer of phospholipids forming a membrane.’”’ 
There is no evidence that they associate with each other 
to form a specific complex. When increasing fractions of 
the B-adrenergic receptors in a membrane are destroyed 
by covalent modification, the final activities of adenylate 
cyclase achieved following addition of an agonist remain 
the same (Figure 14-29A).””® This observation is incon- 
sistent with a strong, specific association between 
B-adrenergic receptors and adenylate cyclase because a 
permanently inactivated ß-adrenergic receptor does not 


818 Membranes 


= = 3000 
D D 
E E 
T © 
= = 
E 2 

S 2000 
D. 
2 = 
; $ 
o w| x A 
u d 5 = je) 
© Eu I = 

a u i 1000 
en | S 
t | >) 
= E 
E 3 
20 40 60 80 o 
K Time (min) < 


Time (min) Time (min) 


Figure 14-29: Rate of activation of adenylate cyclase as a function of the concentration of functional, occupied -adrenergic receptor.” 
(A) Samples of plasma membranes from erythrocytes of Meleagris gallopavo were exposed to various concentrations (A, none; O, 17 uM; 
m, 43 uM; O, 100 uM; and e 230 uM) of N-[2-hydroxy-3-(1-naphthyloxy)propyl]-N’-(bromoacetyl)ethylenediamine, a specific, covalent, 
irreversible inhibitor of B-adrenergic receptor, to destroy the functional ability of various fractions of the receptor. Equivalent amounts of 
each sample (1.4 mg mL”) were mixed at 25 °C with saturating concentrations of the agonist epinephrine and guanosine ß,y-imidotriphos- 
phate, an analogue of GTP that cannot be hydrolyzed by G-proteins. At various times, samples were removed, and the specific activity 
(picomoles of cyclic-3’,5’-AMP minute” milligram"') of adenylate cyclase was determined. As time progressed, the specific activity of the 
enzyme in the membranes increased monotonically until a maximum activity, Bama Was reached. The specific activity of adenylate 
cyclase is plotted as a function of time (minutes). Inset: Data in panel A are replotted on a semilogarithmic field to show that the approach 
to maximum activity is a first-order process. The amount of the competent adenylate cyclase that has not yet been activated at a given 
time t, Einact,n is equal to the amount of enzymatic activity at full activation, E,cı,may minus the adenylate cyclase activity observed at time t, 
Eact, The amount of adenylate cyclase activity that has not yet been activated is normalized by dividing by E,.ımax This normalized value is 
plotted as a function of time (minutes). (B) Equivalent samples of plasma membranes from erythrocytes of M. gallopavo (1.4 mg mL") 
were mixed in several tubes with MgATP at 25 °C, and agonist-activated adenylate cyclase was initiated by adding guanosine ß,y-imido- 
triphosphate and various concentrations of the agonist epinephrine (A, 0.5 uM; m, 1.0 uM; O, 3.0 uM; O, 6 uM; and @, 15 uM). At the indi- 
cated times, samples were removed from each tube and the total accumulations of cyclic-3’,5’-AMP were assessed. The total accumulation 
of cyclic-3’,5’-AMP [picomoles of cyclic-3’,5’-AMP (milligram of protein) "] is plotted as a function of time (minutes). Panel B represents the 
integrated accumulation of cyclic-3’,5’-AMP, and panel A shows the rate at which it is accumulating at any time. Reprinted with permis- 
sion from ref 728. Copyright 1978 American Chemical Society. 


produce a permanently unresponsive adenylate cyclase. 
Rather, it is the stimulatory G-protein that couples the 
binding of agonist on the ß-adrenergic receptor to the 
activation of adenylate cyclase. 

When the complete system for adenylate cyclase 
is depleted of G-protein by affinity adsorption, no 
coupling between binding of an agonist and adenylate 
cyclase activity can occur until it is added back.‘ The 
membranes of Cyc-549 lymphoma cells contain both a 
ß-adrenergic receptor and adenylate cyclase but lack a 
stimulatory G-protein and cannot display agonist- 
activated adenylate cyclase activity. When homoge- 
neous stimulatory G-protein is added to these mem- 
branes, agonist-activated adenylate cyclase activity 
appears.” Therefore, the G-protein must perform the 
coupling. 

The complex between the $ subunit and the y sub- 
unit of the stimulatory G-protein binds tightly and 


specifically to B-adrenergic receptor whether or not an 
agonist is bound.” The o subuntt of the stimulatory 
G-protein also binds specifically and just as tightly to the 
B-adrenergic receptor whether or not it has guanosine 
5’-O-(3-thiotriphosphate) bound to it. It forms an even 
tighter complex, however, with the p subunit and ysub- 
unit of the stimulatory G-protein when it is unliganded 
with GTP or an analogue of GTP than when it is.’ A 
complex between one ß-adrenergic receptor and one 
G-protein has been identified in solutions derived from 
plasma membranes dissolved in nonionic detergent, but 
only when they are pretreated with ß-adrenergic ago- 
nists.” In addition, the incorporation of purified homo- 
geneous G-protein and purified homogeneous 
B-adrenergic receptor together into phospholipid vesi- 
cles causes the receptor to have a higher affinity for ago- 
nist, by a factor of more than 100, than it does in the 
absence of G-protein.” This effect must result from the 


formation of a complex between f-adrenergic receptor 
and the respective G-protein in these membranes, but it 
is eliminated by the addition of GTP, which binds to the 
G-protein. 

The asubunit dissociates completely from the 
Bsubunit and ysubunit of stimulatory G-protein when 
GTP or a GTP analogue that cannot be hydrolyzed is 
bound to it. Consequently, the affinity of the 
a subunit of the stimulatory G-protein for the complex 
between ß-adrenergic receptor and the p subunit and 
ysubunit of the stimulatory G-protein probably 
decreases after the B-adrenergic receptor has bound ago- 
nist and promoted the binding of GTP to the a subunit of 
the stimulatory G-protein. 

From all of these observations, the sequence of 
events at the B-adrenergic receptor is thought to be the 
following.”° Before agonist binds there is a specific 
complex among f-adrenergic receptor, the 8 subunit 
and ysubunit of stimulatory G-protein,”””*® and the 
asubunit of stimulatory G-protein to which GDP is 
bound. Upon the binding of an agonist to ß-adrenergic 
receptor, the dissociation of the GDP is stimulated as 
well as the association of GTP with the resulting empty 
æ subunit of the stimulatory G-protein.” The bind- 
ing of GIP weakens the affinity of the œ subunit of stim- 
ulatory G-protein for the remainder of this complex. 
This weakening increases the rate at which the a sub- 
unit exchanges between the complex and the bilayer of 
phospholipids in which it is in free solution. When it is 
dissociated from the complex in free solution within the 
bilayer of phospholipids, the complex between GTP and 
the a subunit is able to collide with a molecule of adeny- 
late cyclase. 

The æ subunit of the stimulatory G-protein when it 
has guanosine 5’-O-(3-thiotriphosphate) bound to it 
forms a strong complex with the cytoplasmic domain of 
adenylate cyclase,” for which a crystallographic molec- 
ular model is available.”’” The formation of this complex 
between asubunit of the stimulatory G-protein and 
adenylate cyclase stimulates the enzymatic activity of 
the cytoplasmic domain of adenylate cyclase by a factor 
of greater than 1000." A tight complex between one 
intact molecule of adenylate cyclase and one molecule of 
G-protein has also been identified in solutions derived 
from plasma membranes dissolved with nonionic deter- 
gents.” 

The system for adenylate cyclase in erythrocytes 
from M. gallopavo that is stimulated by -adrenergic 
agonists takes several minutes to reach full enzymatic 
activity after a -adrenergic agonist is added to a suspen- 
sion of plasma membranes (Figure 14-29B).’””® As the 
fraction of the B-adrenergic receptors occupied by ago- 
nist is increased, the duration of the lag preceding the 
expression of full enzymatic activity decreases (Figure 
14-29B). The relationship between the rate at which full 
enzymatic activity is established, the final level of activity 
of the enzyme, and the fraction of the P-adrenergic 
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receptors occupied by agonist, which was determined by 
a direct measurement of bound agonist in separate 
experiments, is consistent”? with the kinetic mecha- 
nism 


K 
dA AR (14-12) 


A+R 


kp 
AR + Binact ——> AR + Eaa (14-13) 


where Ky, is the dissociation constant for agonist, 
R is B-adrenergic receptor, A is agonist, Ejnac is 
inactive adenylate cyclase, and E,,, is active adenylate 
cyclase. 

The meaning of the second step in the mechanism 
is that the rate at which adenylate cyclase is activated is 
directly proportional to the concentration of liganded 
B-adrenergic receptor, A-R, and the concentration of 
inactive competent adenylate cyclase, Basen, and conse- 
quently is first-order in each of these concentrations. 
When f-adrenergic receptor is saturated with agonist, 
all of the receptor is in the liganded form A-R, and the 
reaction is governed solely by this second step 
(Equation 14-13). When the concentration of the com- 
plex between agonist and receptor at saturation [A-R]sat 
is decreased systematically by decreasing the concentra- 
tion of the competent ß-adrenergic receptor by a spe- 
cific covalent modification, the rate of production of 
active adenylate cyclase, Ba containing the activated 
active site, decreases in direct proportion to the 
decrease in the concentration of occupied receptors at 
saturation (Figure 14-29A).’”” Consequently, the rate of 
the reaction defined by the second step in the mecha- 
nism is first-order in the concentration of occupied 
fB-adrenergic receptor. When ß-adrenergic receptor is 
saturated with agonist, the rate of formation of active 
adenylate cyclase, Ban is first-order in the concentra- 
tion of the unactivated adenylate cyclase, Einact (inset to 
Figure 14-29A). 

The reaction governed by kp (Equation 14-13) 
results from the collision of two proteins while they dif- 
fuse through the bilayer of phospholipids. When the 
microviscosity of the membrane, as judged by the resid- 
ual polarization of a hydrophobic fluorescent molecule, 
is decreased by adding cis-vaccenic acid, the rate of pro- 
duction of active adenylate cyclase at saturating concen- 
trations of agonist increases monotonically.” When the 
viscosity of the plasma membrane is increased by remov- 
ing some of its bulk phospholipid, the ability of the bind- 
ing of agonist to f-adrenergic receptor to activate 
adenylate cyclase is significantly inhibited.’ It has also 
been observed that when the macroscopic viscosity of 
the membranes in the disks of retinal rods is decreased 
by halving the concentration of rhodopsin in them, the 
rate of phototransduction is accelerated 1.7-fold.’“ 


820 Membranes 


Phototransduction results from the coupling of 
rhodopsin to cyclic GMP phosphodiesterase in a manner 
homologous to the coupling of ß-adrenergic receptor to 
adenylate cyclase. 

If, as is generally assumed, the two proteins that are 
colliding within the membrane to activate the enzymatic 
activity are adenylate cyclase and the complex between 
GTP and the «æ subunit of the stimulatory G-protein, it 
necessarily follows that the concentration of the complex 
between GTP and the a subunit of stimulatory G-protein 
in the bilayer of phospholipids must at all times be 
directly proportional to the concentration of liganded 
B-adrenergic receptor. To be so, the complex between 
GTP and the @ subunit of stimulatory G-protein must be 
in constant communication with B-adrenergic receptor. 
Consequently, the œ subunit and the complex between 
B-adrenergic receptor and the p subunit and y subunit of 
stimulatory G-protein must be rapidly associating with 
and dissociating from each other in an equilibrium that 
maintains the required proportionality.‘”®’® The fact 
that one complex between agonist and P-adrenergic 
receptor is able to catalyze the exchange of GDP for GTP 
on more than 10 o subunits of stimulatory G-protein in 
the space of a few seconds”***’” suggests that the equi- 
libration between these two proteins is much more rapid 
than the activation of adenylate cyclase (Figure 14-29). 
The fact that the a subunit of stimulatory G-protein is 
attached to the membrane only through its posttransla- 
tional lipid allows it to diffuse across the membrane 
more rapidly than if it were an integral membrane- 
bound protein, and this property increases the rate of its 
equilibration with -adrenergic receptor. 

A membrane within a living cell often differs in 
shape and extension from the same membrane purified 
from a homogenate of that cell. Small organelles such as 
mitochondria, chloroplasts, and lysosomes remain intact 
and are not visibly or functionally altered during gentle 
disruption of the cell and purification by centrifugation. 
The endoplasmic reticulum is constantly changing in its 
shape and contiguity even within the living cytoplasm, a 
property reflected in the fact that, upon homogenization, 
it readily disintegrates into small microsomes. The 
plasma membrane, however, in its natural state in an 
intact cell, is required to remain at all times a continuous 
enclosure surrounding the cytoplasm even while it must 
maintain a total surface area much greater than the 
membranes of any of the stable organelles it contains. 
When the cell is homogenized, the plasma membrane, as 
does the endoplasmic reticulum, also disintegrates into 
small microsomes, but much more reluctantly. 

This reluctance seems to result from the fact that 
plasma membranes are skins stretched and pinned upon 
a frame. In bacteria and fungi, the frames are the outer 
membranes and cell walls on the extracytoplasmic sur- 
face, for when these integuments are digested away, a 
fragile, naked spheroplast remains that is easily disinte- 
grated. In animal cells, however, there is often a frame on 


the cytoplasmic side of the plasma membrane upon 
which the membrane is stretched and to which it is 
pinned. A limited number of molecules of integral mem- 
brane-bound proteins and lipids function as the pins. 
These pins connect the continuous bilayer of the plasma 
membrane to the frame at random points scattered over 
its surface but do not noticeably affect the physical prop- 
erties of the bilayer of phospholipid. The stability pro- 
vided by any one of these supports allows a plasma 
membrane to remain unbroken over its entire surface 
area even though it is a thin, fragile fluid film that disin- 
tegrates when it is removed from the frame. 

The cytoskeleton is the frame upon which the 
plasma membrane of an animal cell is stretched and 
pinned. Although most of the proteins are free to diffuse, 
the proteins pinning the plasma membrane to the 
cytoskeleton do not. In the short run, the membrane is 
fixed at these points of attachment but fluid everywhere 
else. When the cytoskeleton collapses and, as a result, the 
pins cluster rather than remaining spread out, the 
unsupported plasma membrane in the abandoned 
regions slowly fragments into microsomes.’”’ 

The cytoskeleton of an erythrocyte is the best char- 
acterized. The proteins constituting the cytoskeleton of 
an erythrocyte are, however, found in most of the cells 
from other tissues and are thought to perform the same 
role in these other types of cells. When erythrocytes are 
added to a solution of nonionic detergent, the plasma 
membrane dissolves and leaves behind its cytoskeleton 
that has the shape of an erythrocyte but is a basket rather 
than a bag.’ The cytoskeleton exposed by this treatment 
contains mainly spectrin, actin,’ and protein 4.1. 

Spectrin’” was originally identified by Rosenthal, 
Kregenow, and Moses!" as a protein composing a fuzzy 
network on the cytoplasmic side of the plasma membrane 
of an erythrocyte. It was isolated by extraction of plasma 
membranes from erythrocytes at low ionic strength in the 
presence of a chelating agent for multivalent cations.’”” 
The protein from human erythrocytes is composed of an 
a polypeptide (2418 aa) and a £ polypeptide (2136 aal" 
and is an aßheterodimer or (a).heterotetramer 
depending upon the conditions.’ The purified het- 
erodimer has a high intrinsic viscosity ([ņ] = 140 cm? 
g'),”* indicating that it is elongated. The two polypep- 
tides of the af heterodimer are homologous in sequence, 
and each contains internally repeating domains (Figure 
7-16).’° In electron micrographs, the off heterodimer 
appears as a flexible two-stranded segment of rope about 
100 nm long (Figure 14-30B).’°® The two strands are not 
held together along their entire length and tend to splay. 
An (aß), heterotetramer is formed from two dimers asso- 
ciating end to end (Figure 14-30A). 

The actin in the cytoskeleton is in the form of short 
thin filaments (Figure 9-1B) with a uniform length of 
12-14 monomers.” The protein dematin”™ is associ- 
ated with these actin filaments and controls the associa- 
tions between them.”” 


The Fluid Mosaic 821 


Figure 14-30: Electron micrographs of individual molecules of human spectrin as the (o/8),heterotetramer (A) or of heterodimer (B).’° 
Human erythrocytes were washed and lysed, and the resulting plasma membranes were washed to remove the hemoglobin. The purified 
plasma membranes were extracted with 0.1 mM EDTA, pH 8, at 0 °C for 40 h, and the membranes were then removed by centrifugation. The 
released spectrin was purified from the extract by molecular exclusion chromatography. Solutions containing the spectrin were brought to 
70% glycerol and sprayed onto freshly cleaved mica. The surface of the mica was then sprayed at an angle of 9° to the surface with a mixture 
of platinum and carbon vaporized by electric discharge. The spray was applied while the sample was rapidly rotating about an axis normal 
to the surface. This causes the molecules to be surrounded by drifts of platinum, which is electron-dense. These drifts produce an outline of 
the molecules of protein. The film of platinum and carbon was then transferred from the mica to a grid for electron microscopy. The mole- 
cules are represented by the tortuous, elongated outlines. Magnification 170000x. Reprinted with permission from ref 756. Copyright 1979 


Academic Press. 


Human protein 4.1 is a monomer with a polypep- 
tide 864 aa in length. It is a globular protein that can bind 
to an af heterodimer of spectrin near the other end of 
the rope from that which combines with another 
aß heterodimer to form the (aß), heterotetramer.'%’® 
When purified actin, protein 4.1, and spectrin are mixed 
together with ATP, they spontaneously form a macro- 
scopic gel made up of heterotetramers of spectrin cross- 
linked by the short filaments of actin.” This gel 
presumably is analogous to the meshwork seen in the 
cytoskeleton. Protein 4.1 promotes this association of 
spectrin and actin” by binding simultaneously to a 
molecule of spectrin and a short filament of actin and 
linking them together.’”” 

One set of the pins attaching the plasma membrane 
of an erythrocyte to the cytoskeleton is formed from two 
proteins, ankyrin and band 3 anion transport protein. 
Human erythrocytic ankyrin is a monomer constructed 
from one single polypeptide about 1880aa in 
length.” °° Ankyrin has a fairly high frictional ratio (f/f, 
= 1.46),’° and in electron micrographs it appears as a 
cluster of three to five globular domains.’ In keeping 
with this structure, it has a detachable domain (naa = 650) 
that contains the binding site specific for spectrin.’ 
Ankyrin binds tightly to spectrin near the end of the rope 
that associates to form the (a@f),heterotetramer, the 
opposite end from that to which protein 4.1 binds.’ 
Intact ankyrin can also bind to the amino-terminal, 
detachable domain of band 3 anion transport protein in 
a simple bimolecular reaction (Ky = 10° M) as well as to 
intact band3 anion transport protein in solutions of 
nonionic detergents.” A freely soluble heterodimer con- 
taining one polypeptide of ankyrin and one polypeptide 
from band 3 anion transport protein can be purified in 
solutions of nonionic detergent.” 

Because intact ankyrin binds tightly to both band 3 


anion transport protein, which is an integral mem- 
brane-bound protein, and spectrin, which is incorpo- 
rated into the cytoskeleton, it can link the cytoskeleton to 
the plasma membrane. As there are fewer molecules of 
ankyrin in an erythrocyte than molecules of band 3 anion 
transport protein, only a minority of the molecules of 
band 3 anion transport protein, presumably chosen at 
random, are linked to the cytoskeleton. Unlike the unat- 
tached molecules of band3 anion transport protein, 
those that are attached to ankyrin and pin the cytoskele- 
ton to the membrane are unable to diffuse translation- 
ally”? or rotationally’° because they are attached rigidly 
to the cytoskeleton. Membrane protein band 4.2 stabi- 
lizes this interaction between ankyrin and band 3 anion 
transport protein.’” 

At the other end of an off heterodimer of spectrin, 
the protein 4.1 that is creating the interaction of spectrin 
and actin is also linking the cytoskeleton to the mem- 
brane. Protein 4.1 binds strongly to phosphatidylserine 
on the inner surface of the bilayer of phospholipids” 
and also to the cytoplasmic portion of glycophorin C’” 
and glycophorin A.’”° 

In the short run, the molecules of band3 anion 
transport protein, phosphatidylserine, glycophorin A, 
and glycophorin C are the stationary points around 
which flow the traffic of the proteins and lipids of the 
plasma membrane. In the long run, as the cell changes 
shape and size, these points of attachment also rearrange 
fluidly to accommodate the changes. 

The pinning of the plasma membrane onto the 
cytoskeleton and the sculpting of the various mem- 
branes into the cellular organelles, like the assembly of 
microtubules, filaments of actin, and thick filaments or 
the mixing of a vast array of soluble proteins at high con- 
centration to produce cytoplasm, seamlessly transforms 
protein chemistry into cell biology. 
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Problem 14-7: Cytochrome b, is a protein that is firmly 
attached to the membranes of the endoplasmic reticu- 
lum. The amino acid sequence of the protein from Rattus 
norvegicus is 


ABOSDKDVKYYTLEEIOKHKDSKSTWVILHHKVYDLTKFL 
EEHPGGEEVLREQAGGDATENFEDVGHSTDARELSKTYII 
GELHPDDRSKIAKPSETLITTVESNSSWWTNWVIPAISAL 
VVALMYRLYMAED 


When endoplasmic reticulum is treated with trypsin or 
pancreatic triacylglycerol lipase contaminated with 
trypsin, only the peptide bond following Lysine 90 is 
cleaved, and a protein containing the first 90 amino acids 
of cytochrome b; falls off the membrane. This soluble 
protein can be purified and crystallized, and its structure 
has been determined by X-ray crystallography. This pro- 
tein will be referred to as cytochrome b;(trypsin) or 
cytochrome b, (lipase), respectively. 


(A) Explain these observations in terms of the distri- 
bution of specific amino acids in the sequence of 
the protein. 


(B) Draw a representation of the complete 
cytochrome b, molecule attached to the mem- 
brane. 


It is possible to release, by the use of a detergent, intact 
cytochrome b; from microsomes of endoplasmic reticu- 
lum and to purify this protein. This protein will be 
referred to as cytochrome b;(detergent). 


(C) When various amounts of cytochrome b;(deter- 
gent) or cytochrome b,(lipase) are mixed with 
endoplasmic reticulum membranes and incu- 
bated for 18h at 2 °C, and the membranes are 
then washed extensively, the detergent form 
attaches to the membranes while the lipase form 
does not (see the following figure). Explain this 
observation. 


ch 
CH 


00 
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Moles cytochrome bs bound 
Mole endogenous cytochrome bs 
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0 10 20 30 40 50 60 


Moles cytochrome be (detergent) added 
Mole endogenous cytochrome De 


Binding of cytochrome b;(detergent) (@) or cytochrome b, (lipase) 
(O) to microsomes of endoplasmic reticulum from liver of 
O. cuniculus. The results are expressed as the moles of exogenous 
cytochrome b; bound to the microsomes for every mole of 
endogenous cytochrome b; in the microsomes as a function 
of the moles of exogenous cytochrome b; added for every mole of 
endogenous cytochrome b; present. The points marked with x 
were samples that were washed further in 0.5 M NaCl, pH 8.0, to 
remove any loosely bound cytochrome b;. Reprinted with per- 
mission from ref 275. Copyright 1972 Journal of Biological 
Chemistry. 


Cytochrome-b; reductase is also attached to the endo- 
plasmic reticulum. It catalyzes the following reaction: 


NADH + H* + 2 ferricytochrome b; ==> 
NAD* + 2 ferricytochrome b; 


The native endoplasmic reticulum membrane contains 
one molecule of cytochrome-b, reductase and 10 mole- 
cules of cytochrome b; for every 2.5 x 10° nm? (see the 
following figure). 


Schematic representation of the spatial relationships between 
phospholipid, cytochrome b;, and cytochrome-b, reductase on 
the surface of a microsomal vesicle. The surface areas used for 
these components were phospholipid, 0.63 nm’; cytochrome bs, 
Anm’; and cytochrome-b, reductase, 10 nm’. The entire area 
(diameter = 56 nm) would include 4000 molecules of phospho- 
lipid. The endogenous concentrations of the two proteins would 
place approximately 10 molecules of cytochrome b; (0) and 
approximately 1 molecule of reductase (@) in this area, assuming 
a random distribution on the outer surface of the membrane. The 
left sector indicates the molecular density with only endogenous 
cytochrome b;, and the right sector, the density with a 10-fold 
molar excess of cytochrome b;(detergent) (©). Reprinted with 
permission from ref 275. Copyright 1972 Journal of Biological 
Chemistry. 


When all of these cytochrome b; molecules are in the oxi- 
dized form and the reaction is initiated by addition of 
reducing equivalents, the time required for the reduction 
of half of all the cytochrome b, molecules in the two 
types of membranes by the reductase is 0.47 s for the 
unenriched membranes and 0.13s for the enriched 
membranes. 


(D) What ability must cytochrome b; possess in order 
to participate as a reactant in this reaction? 


(Œ) Why is the tı; shorter in the case of the enriched 
membranes? 


(F) The concentration of cytochrome b; in the solu- 
tion during the reduction experiments with the 
enriched membranes just described was about 2 x 
10° M, and the rate of the reaction catalyzed by 
the reductase was independent of the concentra- 
tion of membranes suspended in the solution. In 
order to observe the same turnover rate, however, 
when cytochrome b;(trypsin) was the substrate, 
its concentration had to be 5 x 10° M and the rate 
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of the reaction catalyzed by the enzyme depended 
on the concentration of cytochrome b;(trypsin) in 
the solution. Explain these observations. 


Problem 14-8: Calculate the translational diffusion coef- 
ficients at 20 °C for integral membrane-bound proteins 
the apparent radii, o, of whose bundles of œ helices are 
1.0, 2.0, 4.0, and 5.0 nm. Assume that the viscosity of the 
bilayer of phospholipid is 100 mPa s and the width of the 
bilayer of phospholipid is 5 nm. 


Problem 14-9: The vertebrate rod is a cell in the retina 
responsible for registering light rays in the visual process. 
The end of the cell that performs this task is called the 
outer segment. It is a cylinder filled with disks, which are 
flattened circular, closed sacs pinched off from the 
plasma membrane. 


A _ Disks floating 
| freely 


Pinching 
off 


Infolding of 
outer cell 
membrane 


Diagram showing frog rod outer segment. The lamellar 
membranous structure of the rod outer segments consists of a 
stack of sacs, except near the base, where it consists of infoldings of 
the cell plasma membrane that are being pinched off. Reprinted 
with permission from ref 777. Copyright 1970 Scientific American 
Inc. 


The disks are stacked in the rod as poker chips are 
stacked in a rack. In this way, greater than 90% of the 
membrane from which the disk is made lies normal to 
the cylindrical axis of the rod outer segment. 

Rhodopsin (Table 14-6) is the only protein dis- 
solved in the disk membrane. 
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(A) In an outline of the cross section of a disk draw a 
schematic diagram of the molecular structure of 
the disk membrane. Include rhodopsin mole- 
cules, labeled with an arrow indicating direction 
of insertion, and phospholipids. 


Rhodopsin is a protein to which is attached, by an 
imine linkage, one molecule of 11-cis-retinal. When 
11-cis-retinal is exposed to intense light it isomerizes to 
all-trans-retinal. This process is known as bleaching and 
results in a color change from orange to clear. 


(B) What does the following experiment demonstrate 
about the properties of a protein such as 
rhodopsin in a biological membrane? Reprinted 
with permission from Nature, ref 675. Copyright 
1974 Macmillan Magazines Limited. 


Rod outer segments were obtained by 
gently shaking retinas dissected under dim red 
light from the eyes of frog (Rana catesbeiana) 
and mudpuppy (Necturus maculosus) which 
had been dark-adapted for more than 10 h. The 
rods were shaken into a microchamber 
containing a standard Ringer solution and 
examined in a Shimadzu 50L microspectropho- 
tometer (MSP) fitted with a high quantum effi- 
ciency photomultiplier (Hamamatsu type 
R375). Single rods which appeared intact and 
which lay flat on the bottom of the chamber 
were selected for observation, and all observa- 
tions were completed within 30 min after the 
rods were isolated. The rhodopsin in isolated 
rods, once bleached, does not regenerate, 
hence dim red light was used for selection, 
focusing, and alignment. 

The measuring beam of the MSP was lim- 
ited by an aperture and a condensing lens to 
form a rectangle about 2 x 20 um in cross sec- 
tion. The long axis of the rectangle was aligned 
with the long axis of the rod and a simple 
motor-driven “alternator” optically shifted the 
rectangular measuring beam back and forth 
between the two sides of the rod. Thus the 
absorbance of the rhodopsin on each side of the 
rod could be compared directly. The wave- 
length of the measuring beam was set at the 
absorption peak of the visual pigment: 500 nm 
for frog, 530 nm for mudpuppy. 

With suitable alignment and focusing the 
absorbance was essentially equal on both sides 
of the unbleached rods, as shown by the first 
pair of measurements on an unbleached rod at 
the beginning of each of the two recordings in 
the figure below. The alternator was then 
stopped momentarily and the intensity of the 
measuring beam was increased about 1000-fold 
to bleach some pigment on one side of the rod. 


The exponential decrease in absorbance during 
the bleach was recorded, and then the intensity 
was dropped to the original level and the alter- 
nator turned on again. The figure below shows 
that immediately after the bleach the 
absorbance on the unbleached side was little 
changed, but there was a marked drop in 
absorbance on the bleached side. Within the 
next few seconds, however, the absorbance of 
the unbleached side decreased while that on 
the bleached side increased, and within less 
than 1 min the absorbance of the two sides 
became equal, reaching a final level midway 
between that of each side immediately after the 
bleach. 


Time (s) 


Partially 
bleach 
left side 


The diagrams of a rod depict the pigment dis- 
tribution corresponding in time with the 
absorbance measurements shown below. The 
arrows indicate the location on the rod at which 
each absorbance measurement was made. 
Recordings made from two different rods are 
shown to give an indication of the repeatability 
of the measurements. In each experiment the 
chart recorder was run continuously, as shown 
by the time base. The alternator also ran con- 
tinuously except during the bleach. The records 
thus consist of a repeated pattern in which 
absorbance measurements were made first on 
the left side, then the right side of the rod. 
Between each pair of measurements baseline 
measurements were also made to ensure that 
no drifts occurred (for clarity, these were omit- 
ted from the figure). The spikes on the traces 
were caused by switching transients in the 
alternator. The diameter of a disk is essentially 
equal to the width of the rod, and the width is 
measured in the MSP after completing each 
experiment. 
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metalloproteins, 326, 332 
molar mass, 418, 419 
quaternary structure, 451 
sedimentation velocity, 576 
X-ray scattering, 584 
aspartate kinase 
domains, 378 
aspartate kinase I-homoserine 
dehydrogenase I 
domains, 381 
kinetics of folding, 709 
molar mass, 418, 420 
sieving, 427 
aspartate transaminase 
assembly of oligomers, 712 
domains, 389 
molecular taxonomy, 396 
aspartate-semialdehyde 
dehydrogenase 
aligning crystallographic molecular 
models, 366 
molecular taxonomy, 396 
aspartate-tRNA ligase 
molecular taxonomy, 393 
aspartic acid 
electronic structure, 79 
water in crystallographic molecular 
models, 296 
aspartyl endopeptidase 
crystallography, 182 
domains, 384 
aspartyl imide, 115 
assay of proteins, 13-20 
accuracy, 19 
5-aminopentanamidase, 20 
biological, 19 
chromatographic separation, 14 
cis-aconitase, 19 
coenzymes in, 13 
coenzyme A, 18 
colorimetric, 18 
continuous, 15 
coupled, 16 
cyclosporin synthase, 14 
fumarate hydratase, 13, 19 
galactonate dehydratase, 19 
geranyltranstransferase, 14 
glutamine-pyruvate transaminase, 
19 
glyceraldehyde-3-phosphate 
dehydrogenase 
(phosphorylating), 17 
Hurler corrective factor, 19 
2-hydroxy-6-ketonona-2,4-diene- 
1,9-dioic acid 5,6-hydrolase, 18 
3-hydroxyacyl-CoA 
dehydrogenase, 17 
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hydroxymethylglutaryl-CoA lyase, 
17 
2-hydroxyphytanoyl-CoA lyase, 13 
imidazoleglycerol-phosphate 
dehydratase, 17 
interference, 16 
maturation-promoting factor, 19 
medium-chain acyl-CoA 
dehydrogenase, 16 
membrane-bound proteins, 771 
methylamine-glutamate 
N-methyltransferase, 14 
2-methyleneglutarate mutase, 16 
monophenol monooxygenase, 18 
myosin subfragment 1, 17 
(S)-pantolactone dehydrogenase, 
18 
pH, 13 
phosphofructokinase, 19 
phosphomevalonate kinase, 17 
protocatechuate 3,4-dioxygenase, 
16 
pyruvate carboxylase, 17 
radioactive reactant, 14 
receptors, 15 
reconstitution, 773 
ribose-phosphate 
diphosphokinase, 18 
selectivity, 19 
selenocysteine lyase, 19 
sensitivity, 19 
succinyldiaminopimelate 
transaminase, 20 
triacylglycerol lipase, 16 
tryptophan-tRNA ligase, 14 
assembly 
definition, 659 
assembly map 
ribosome, 716 
assembly of actin 
F-actin capping protein, 730 
fragmin, 730 
gelsolin, 730 
hydrolysis of ATP, 729 
nebulin, 730 
nucleation, 730 
thin filament, 729 
villin, 730 
vinculin, 730 
assembly of fibrin 
cross-linking, 721 
fibrinogen, 717-20 
irreversible, 721 
kinetics, 720 
protein-glutamine 
y-glutamyltransferase, 721 
protofibril, 720 


assembly of helical polymers, 717-33 
assembly of microtubules 
apparent critical concentration, 
727 
bulk concentration, 724 
catastrophic depolymerization, 728 
centrosome, 723 
colchicine, 729 
critical concentration, 724-25 
elongation, 723 
hydrolysis of GTP, 727 
kinetics, 724-26 
microtubule-associated proteins, 
730 
minus end, 725 
nucleation, 723 
number concentration, 724 
plus end, 725 
protofilament, 721 
seeds, 723 
steady state, 727 
tubulin, 722 
assembly of oligomers, 710-17 
alkanal monooxygenase (FMN- 
linked), 714 
arc repressor, 712 
aspartate carbamoyltransferase, 
713-14 
aspartate transaminase, 712 
cross-linking for assay, 445 
dihydrolipoyl dehydrogenase, 715 
dihydrolipoyllysine-residue 
acetyltransferase, 715 
dimer, 712 
enzymatic activity, 713 
fructose-bisphosphate aldolase, 713 
fumarate hydratase, 713 
heterooligomers, 713-17 
HIV-1 retropepsin, 713 
inorganic diphosphatase, 710 
kinetics of, 711 
loosely folded monomer, 713 
malate dehydrogenase, 712 
molten globule, 712 
phosphoglycerate mutase, 710-11, 
713 
porphobilinogen synthase, 713 
pyruvate dehydrogenase (acetyl- 
transferring), 715 
pyruvate dehydrogenase complex, 
715 
quantitative cross-linking, 710-11, 
713 
ribosome, 715-17 
steric crowding, 715 
steroid A-isomerase, 712 
tetramer, 710-11 


trimer, 713 
tryptophan synthase, 714 
assignments 
nuclear magnetic resonance, 621-28 
association constant 
intramolecular, 222 
association equilibrium constant 
hydrogen bond, 210 
asymmetry of phospholipids, 808-10 
phospholipid-translocating 
ATPase, 809 
phospholipid scramblase, 809 
diphosphatidylglycerol, 810 
phosphatidylcholine, 810 
phosphatidylethanolamine, 810 
phosphatidylglycerol, 810 
phosphatidylinositol, 810 
phosphatidylserine, 810 
ATP diphosphatase 
purification, 773 
ATP-dependent DNA helicase Rep 
fluorescence resonance energy 
transfer, 610 
axes 
crystallography, 151 
axes of symmetry, 451-56 
actin, 452-53 
fold of the symmetry, 452 
helical polymer, 452 
hexokinase, 454 
malate dehydrogenase, 452-53 
protocatechuate 3,4-dioxygenase, 
454 
rotational, 452 
rotational axis of pseudosymmetry, 
452 
screw, 452 
symmetry operations, 451 
axial ratio 
collagen, 574-75 
ellipsoid of revolution, 574 
hydrodynamic particle, 574-75 
1-azidopyrene 
hydrophobic reagent for covalent 
modification, 797 
aziridine 
reagent for covalent modification, 
537 
azurin 
aligning amino acid sequences, 360 
metalloproteins, 331 


B 
p structure 

nuclear magnetic resonance, 631 
bacteriophage 

cloning of DNA, 99 


T4 bacteriophage 
helical surface lattice, 500 
bacteriorhodopsin 
amplitude of electron diffraction, 
792 
B barrel, 782 
bound phospholipid, 784 
covalent modification from within 
the bilayer, 797 
crystallization, 772, 775 
crystallographic molecular model, 
777 
electron diffraction, 791 
electron spin resonance, 794 
mean molar mass of an amino 
acid, 418 
membrane-spanning helices, 772 
phase of Fourier transform, 792 
rotational axes of symmetry, 787 
rotational diffusion coefficient, 812 
translational diffusion coefficient, 
813 
bad solvent 
definition, 659 
BamHI site-specific 
deoxyribonuclease 
nucleic acid, association of 
proteins with, 321 
band 3 anion transport protein 
cytoskeleton, 821 
domains, 377 
mean molar mass of an amino 
acid, 418 
membrane-bound proteins, 767 
topography of membrane- 
spanning proteins, 800-2, 806 
vectorial insertion, 767 
bare zone 
thick filament of myosin, 731-32 
barstar 
kinetics of folding, 690, 695 
base 
definition, 62 
base stacking 
nucleic acid structure, 323 
basic fibroblast growth factor 
aligning crystallographic molecular 
models, 366 
basic trypsin inhibitor 
proton exchange, 642-44 
B barrel 
B structure, 260, 263 
a-hemolysin, 776 
indole-3-glycerol-phosphate 
synthase, 262 
integral membrane-bound 
proteins, 776 


membrane-spanning, 776 
packing of ß structure, 285 
porin OmpF, 782 
red fluorescent protein, 287 
retinol-binding protein, 287 
B bulge 
B structure, 260-62 
B-elimination 
sequencing oligosaccharides, 
133 
Bence-Jones protein 
hydrogen bonds in 
crystallographic molecular 
models, 308 


water in crystallographic molecular 


models, 294 
benzoate 4-monooxygenase 
domains, 382 
benzoylformate decarboxylase 
molecular taxonomy, 396 
polyproline helix, 259 
benzyl bromide 
reagent for covalent modification, 
536 
B helix 
B structure, 261 
2,3,4,5-tetrahydropyridine-2,6- 
dicarboxylate N-succinyl- 
transferase, 263 
bicelles 
membrane-bound proteins, 773 
bicontinuous cubic phase 
crystallization of integral 
membrane-bound proteins, 
775 
bilayer of lipids, 745-63 
cholesterol, 745, 759-60 
cross-sectional area, 760 
cross-sectional area of cholesterol, 
759 
distinct phases, 760 
distribution of electron density, 
760-61 
mole fraction of cholesterol, 759 
phospholipids, 745 
width, 759 
bilayers of phospholipid 
alkane proximal to the glyceryl 
groups, 758 
conformation at the glyceryl 
backbone, 752 
coordinated tilting of the 
hydrocarbon, 758 
core of the hydrocarbon, 757 
cross-sectional area, 754 


crystallographic molecular models, 


751-53 
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deuterium nuclear magnetic 
resonance, 757 
electron density, 750 
electron spin resonance, 755-56 
electrostatic repulsion, 754 
heat of fusion, 755 
hydration, 751 
immiscible lipids, 755 
molecular motion, 756 
neutron diffraction, 750, 754 
neutron scattering density, 755 
nitroxyl fatty acid, 755-56 
order parameter, 756-57 
sliding fluctuations, 752 
solid and liquid, 754 
stereochemical paradox, 757-59 
steric effects of the hydration, 
758 
steric repulsion, 751 
surface potential, 754 
width, 750 
X-ray diffraction, 750-51 
binding assays, 15 
biological assays, 19 
biological membranes 
diffraction of X-radiation, 808 
diffusion in, 811-13 
phase transitions, 808 
rafts in, 811 
spin-labeled probes, 808 
two-dimensional solution, 811 
biotin carboxylase 
domains, 388 
molecular taxonomy, 397 
biotin-dependent carboxylase 
domains, 382 
N,N-bis(2-hydroxyethyl) - 
2-aminoethanesulfonic acid 
buffer, 68 
N,N-bis(2-hydroxyethyl) glycine 
buffer, 68 
1,4-bis(2-sulfoethyl) piperazine 
buffer, 68 
bisimidates, 441 
bithorax complex 
amino acid sequence, 106 
BLAST, 354 
blunt ends 
sequencing of DNA, 96 
B-N-acetylglucosamidase 
sequencing oligosaccharides, 134 
bond angles 
hydrogen bond, 206 
bond length 
hydrogen bond, 205 
bonding 
molecular orbital, 57 
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bonding electrons 
electronic structure, 56 
bonding molecular orbital 
electronic structure, 56 
bound ions 
molecular charge, 33 
boundary layer of phospholipid 
acetylcholine receptor, 786 
Ca**-transporting ATPase, 785 
cholesterol, 786 
cytochrome-c oxidase, 786, 805 
deuterium nuclear magnetic 
resonance spectroscopy, 
786 
electron spin resonance, 785 
integral membrane-bound 
proteins, 784-86 
nitroxylphosphatidylcholine, 785 
rhodopsin, 786 
bovine pancreatic trypsin inhibitor 
folding, 685 
bovine serum albumin 
osmotic pressure, 419 
B-pleated sheet 
secondary structure, 261-62 
B propeller 
B structure, 260, 264 
methanol dehydrogenase, 264 
Bragg spacing 
crystallography, 158 
branching 
oligosaccharides of glycoproteins, 
128 
brominating agents 
reagents for covalent modification, 
539 
bromoperoxidase 
tetrahedral symmetry, 487 
bromopyruvate 
reagent for covalent modification, 
549 
B sheets 
packing of ß structure, 285 
p structure, 285-87 
alcohol dehydrogenase, 261 
B barrel, 260, 263 
B bulge, 260-62 
B helix, 261 
B-pleated sheet, 262 
B propeller, 260, 264 
carbonate dehydratase, 261 
chymotrypsin, 261 
circular dichroism, 598 
concanavalin A, 261 
crystallography, 165-67 
fatty-acid-binding protein, 260 
gap, 260 


intramolecular hydrogen bonds, 228 


left-handed twist, 260 
micrococcal nuclease, 261 
secondary structure, 260-61 
btk kinase 
domains, 386 
p turns 
circular dichroism, 599 
crambin, 265 
crystallography, 165-67 
definition, 261 


intramolecular hydrogen bonds, 227 


secondary structure, 261-64 
type I, 263, 265 
type II, 263 
types of, 262 
buffers, 66 
N,N-bis(2-hydroxyethyl) -2-amino- 
ethanesulfonic acid, 68 


N,N-bis(2-hydroxyethyl) glycine, 68 


1,4-bis(2-sulfoethyl) piperazine, 68 
N-[2-hydroxy-1,1-bis(hydroxy- 
methyl)ethyliglycine, 68 
1-(2-hydroxyethyl) -4-(3-sulfo- 
propyl)piperazine, 68 
N-(2-sulfoethyl)cyclohexylamine, 
68 
N-(2-sulfoethyl)morpholine, 68 
N-(3-sulfopropyl)-2-amino-1,3- 
dihydroxy-2-hydroxymethyl- 
propane, 68 
N-(3-sulfopropyl)morpholine, 68 
bulk concentration 
assembly of microtubules, 724 
of polymer, 724 
bulk relative permittivity, 201 
K bungarotoxin 
interfaces, 480 
point group, 466-67 
buoyant densities 
cell fractionation, 743 
buoyant force 
sedimentation velocity, 576 
buoyant mass 
sedimentation velocity, 576 
buried hydrogen bonds 
hydrogen bonds in 
crystallographic molecular 
models, 306 
proton exchange, 641 
buried ion pair 
ionic interactions in 
crystallographic molecular 
models, 303 
buried side chain, 273 
Bvalue 
crystallography, 175 


C 
detergent, 770 
Cy 9E¢ 
detergent, 770 
Gab: 
detergent, 770 
C12E6 
detergent, 770 
CE 
detergent, 770 
Cy4Eg 
detergent, 770 
ebe 
detergent, 770 
C2 domain, 386 
CA protein 
helical surface lattice, 500 
CAD multienzyme complex 
domains, 379, 390 
E-cadherin 
heterologous associations, 516 
calcium 
metalloproteins, 328-29 
calculated amplitudes 
crystallography, 172 
calculated phases 
crystallography, 173 
caldesmon 
frictional ratio, 577 
calmodulin 
evolution of proteins, 351 
fluorescence resonance energy 
transfer, 608 
nuclear magnetic resonance, 624 
proton exchange, 645 
calorimeter 
thermodynamics of folding, 671 
camel 
immunoglobulin G, 559 
carbamoyl-phosphate synthase 
domains, 383 
carbamoyl-phosphate synthase 
(ammonia) 
domains, 379 
carbamoyl-phosphate synthase 
(glutamine hydrolysing) 
domains, 379 
carbenes 
insertion into nucleophiles, 543 
intramolecular rearrangements, 
543 
reagents for covalent modification, 
543 
carbodiimides 
reagents for covalent modification, 
539-41 


carbonate dehydratase 
aligning amino acid sequences, 
360 
B structure, 261 
covalent modification, 547 
dodecyl sulfate gel electrophoresis, 
422 
electrophoresis, 40 
folding, 670 
glycosylphosphatidylinositol- 
linked proteins, 765 
molecular taxonomy, 393, 395 
molten globule, 683 
nuclear magnetic resonance, 635 
carbon-oxygen double bond 
second hydrogen bond, 256 
carbonyl oxygen 
electronic structure, 60 
carboxy terminus, 74 
immunostaining, 566 
posttranslational modification, 117 
carboxylesterase ESTA 
interfaces, 478-80 
carboxylic acid 
hydropathy, 242 
carboxymethyl cellulose 
chromatography, 9 
5-carboxymethyl- 
2-hydroxymuconate A-isomerase 
molecular rotational axes of 
pseudosymmetry, 477 
carboxymethylenebutenolidase 
molecular taxonomy, 396 
carboxypeptidase 
molecular taxonomy, 395 
carboxypeptidase A 
sequencing of polypeptides, 91 
packing of o helices, 282 
packing of side chains, 279 
carboxypeptidase B 
sequencing of polypeptides, 91 
carboxypeptidase C 
coiled coil of œ helices, 283 
folding, 679 
stereochemistry of side chains, 271 
carboxypeptidase D 
molecular taxonomy, 396 
carboxypeptidase E 
anchored membrane-bound 
proteins, 764 
carrier frequency 
nuclear magnetic resonance, 615 
cartoon 
crystallographic molecular mode, 
167 
cassettes 
site-directed mutation, 111 


catabolite gene activator protein 
association of proteins with 
nucleic acid, 320 
catalase 
diffusion coefficient, 578 
dodecyl sulfate gel electrophoresis, 
422 
domains, 388 
electron nuclear double resonance, 
650 
frictional coefficient, 578 
frictional ratio, 578 
interfaces, 481 
molar mass, 418 
sedimentation coefficient, 578 
sieving, 424 
X-ray scattering, 583 
cathepsin D 
purification, 29 
cathepsin K 
molecular taxonomy, 393 
caveolin 
in rafts, 811 
CD40 tumor necrosis factor receptor 
heterologous associations, 514 
CDP-6-deoxy-L-threo-d-glycero-4- 
hexulose-3-dehydrase 
electron paramagnetic resonance, 
646 
cell fractionation, 743 
buoyant densities, 743 
chloroplasts, 744 
differential centrifugation, 743 
electron microscopy, 744 
for purification of membrane- 
bound protein, 768 
free-flow electrophoresis, 744 
Golgi membranes, 744 
isopycnic centrifugation, 744 
lysosomes, 744 
marker enzymes, 744 
mitochondria, 744 
peroxisomes, 744 
plasma membrane, 744 
precipitation, 744 
rate sedimentation, 743 
rough endoplasmic reticulum, 
744 
sedimentation coefficients, 743 
cell surface glycoprotein a-2 
glycosylphosphatidylinositol- 
linked proteins, 765 
cell surface receptor CD2 
kinetics of folding, 690 
cell wall, 743 
central atom 
acids and bases, 62 
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centrifugal potential 
sedimentation equilibrium, 411 
centrosome 
assembly of microtubules, 723 
ceramide 
glycosphingolipid, 748 
cerebroside 
glycospingolipid, 748 
C’-endo conformation 
proline, 270 
C’-exo conformation 
proline, 270 
chaperone 
folding, 705-8 
chaperone protein PapD 
domains, 390 
chaperonin 60 
cavity, 707 
control of folding, 705-8 
cross-linking, 444-45 
hydrolysis of MgATP, 707 
quaternary structure, 475 
structure, 705 
charge-coupled device 
crystallography, 155 
charged side chains 
heterologous interfaces, 513 
chelation 
ion, 203 
chemical method of sequencing 
DNA, 103, 105 
chemical potential 
osmotic pressure, 408 
sedimentation equilibrium, 411 
chemical shift 
nuclear magnetic resonance, 
614 
chinese hamster ovary cells 
expression of DNA, 110 
chitinase B 
water in crystallographic molecular 
models, 294 
chloramphenicol O-acetyl 
transferase 
ionic interactions in 
crystallographic molecular 
models, 302 
chloramphenicol O-acetyltransferase 
fluorescence resonance energy 
transfer, 608 
interfaces, 480 
multiple isomorphous 
replacement, 160 
point group, 468-69 
space groups, 463 
topography of membrane- 
spanning proteins, 800 
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water in crystallographic molecular 


models, 294 
chloroacetamide, 546 
N-chlorobenzenesulfonamide 

reagent for covalent modification, 

538 
4-chlorobenzoyl-CoA dehalogenase 

interfaces, 481 
p-chloromercuribenzoate 
reagent for covalent modification, 
537 
chloroplasts, 743 
cell fractionation, 744 
3-[(3-cholamidopropyl) dimethy- 
lammonio]-1-propanesulfonate 
detergent, 770 
cholera toxin 
punching a hole in a membrane, 
804 
cholesterol 
bilayer of lipids, 745, 759-60 
effect on microviscosity, 814 
in boundary layer of phospholipid, 

786 

cholesterol oxidase 
domains, 382 
spin-labeled phospholipids, 795 


water in crystallographic molecular 


models, 295 
choline O-acetyltransferase 
purification, 29 
choline-phosphate 
cytidylyltransferase 
peripheral membrane-bound 
proteins, 764 
chorismate mutase 
convergent evolution, 372 
chromatogram 
definition, 4 
chromatography, 2-12 
adsorption, 11 
agarose, 8 
amino acid analysis, 11 
carboxymethyl cellulose, 9 
cellulose, 7 
chromatography by adsorption, 8 
column chromatography, 4 
countercurrent distribution, 5 
DEAE cellulose, 9 
definition, 3 
dextran, 8 
diameter of the particles, 6 
distribution of solute, 2 
Donnan formalism, 10 
elution volume, 4 
flow rate, 6 
gas-liquid chromatography, 4 


glycopeptides, 133 
gradient chromatography, 7 
high-pressure liquid 
chromatography, 6 
hydroxylapatite, 8 
included volume, 12 
interfacial denaturation, 3 
ion exchange, 8 
ionic double layer, 8 
irreversible adsorption, 3 
isocratic zonal chromatography, 7 
of membrane-bound proteins, 771 
mobile phase, 2 
molecular exclusion, 11 
paper chromatography, 4 
partition coefficient, 4 
peptide map, 433 
peptide separation, 90 
polymethacrylate, 8 
polystyrene, 7 
QAE cellulose, 9 
relative mobility, 4 
resolution, 4 
reverse-phase, 8 
saturation, 2-3 
selective adsorption, 3 
silica gel, 7 
stationary phase, 2 
theoretical plate, 4-5 
thin-layer, 4 
titration of charge, 10 
void volume, 4 
zonal chromatography, 4 


chylomicrons 


lipoproteins, 804 


chymotrypsin 


aligning crystallographic molecular 
models, 375 

B structure, 261 

cleavage of polypeptide, 88 

collisional quenching, 603 

covalent modification, 543 

free energy of folding, 674, 677 

hydration, 298 

hydrogen bonds in crystallographic 
molecular model, 306, 308 


chymotrypsin inhibitor 


free energy of folding, 677 
kinetics of folding, 702 


chymotrypsinogen 


aligning amino acid sequences, 360 

covalent modification, 547 

dodecyl sulfate gel electrophoresis, 
422 

endopeptidolytic cleavage, 547 

frictional ratio, 426 

hydration, 298 


mean molar mass of an amino 
acid, 418 
molecular rotational axes of 
pseudosymmetry, 476 
sieving, 424 
thermodynamics of folding, 673 
X-ray scattering, 583 
circular dichroism, 597-601 
a helix, 598 
B structure, 598 
p turn, 599 
circular dichroic spectrum, 598 
circular polarizations, 597 
conformational change, 600 
cytochrome-c oxidase, 598 
cytochrome cı, 598-99 
glyceraldehyde-3-phosphate 
dehydrogenase 
(phosphorylating), 600 
glycine hydroxymethyltransferase, 
598 
kinetics of folding, 689 
molar ellipticity, 598 
molten globule, 684 
Na*/K*-transporting ATPase, 600 
optical rotation, 598 
peptides, 601 
phenylalanines, 598 
plane-polarized light, 597 
protein coat of Hepatitis B virus, 612 
random coil, 660 
random meander, 599 
subtilisin, 600 
tryptophans, 598 
tyrosines, 598 
circular permutation, 679-80 
aspartate carbamoyltransferase, 680 
dihydrofolate reductase, 680 
thiol:disulfide interchange protein 
dsbA, 680 
circular polarizations 
circular dichroism, 597 
cis peptide bonds 
lectin IV, 252 
proline isomerization, 698 
secondary structure, 251-52 
cis-aconitase 
assay, 19 
citraconic anhydride 
reagent for covalent modification, 
536 
citrate (si) synthase 
crystallography, 171 
clathrates 
hydrophobic effect, 233 
clathrin, 498 
light scattering, 590 


CLC-0 chloride channel 
domains, 389 
cleavage of polypeptide 
acid hydrolyis, 87 
arginyl endopeptidase, 88 
chymotrypsin, 88 
cyanogen bromide, 87, 89 
glutamyl endopeptidase, 88 
hydroxylamine, 90 
lysyl endopeptidase, 88 
2-nitro-5-thiocyanatobenzoate, 87, 
89 
papain, 88 
peptidyl-Asp metalloendo- 
peptidase, 88 
thermolysin, 88 
trypsin, 88 
cloning of DNA 
bacteriophage, 99 
colony, 99 
complementary DNA, 98 
DNA ligase (ATP), 96 
DNA ligase (NAD), 96 
DNA-directed DNA polymerase, 97 
DNA-directed RNA polymerase, 97 
extensin, 100 
hybridization of DNA, 100 
library, 99 
plaques, 99 
plasmid, 99 
polymerase chain reaction, 100 
preparative electrophoresis, 101 
probe, 100 
RNA-directed DNA polymerase, 
97 
sequencing of DNA, 99 
closed structure 
definition, 454 
integral membrane-bound 
proteins, 787 
coagulation factor V 
anchored membrane-bound 
protein, 765 
coagulation factor Va 
electron microscopy, 585 
coagulation factor VIII 
anchored membrane-bound 
protein, 765 
cobalt 
metalloproteins, 330 
coenzymatic domain 
domains, 382 
coenzyme 
assay, 13 
refinement, 180 
coenzyme A 
assay, 18 


coenzyme-B 
sulfoethylthiotransferase 


posttranslational modification, 113 


cohesin domain, 386 
hydropathy of side chains, 275 
molecular taxonomy, 397 

coiled coil of o helices 
carboxypeptidase C, 283 
core of, 284 


extracellular matrix protein COMP, 


283 
fibrinogen, 282, 717 
general control proteinGCN4, 
283-84 
heptad repeat, 282 
hydrophobic amino acids in, 283 
intermediate filaments, 506 
keratin, 282 
methyl-accepting chemotaxis 
protein, 284 
myosin, 282, 730 
packing of side chains, 279-85 
synthetic peptides that form, 284 
coincident structure 
molecular taxonomy, 396 
co-ion 
definition, 8 
colchicine 
assembly of microtubules, 729 
cold shock protein CspB 
free energy of folding, 674 
cold shock-like protein 
equilibrium constant for folding, 
664 
kinetics of folding, 664, 667, 702 
colicin El 
fluorescence, 603 
punching a hole in a membrane, 
803 
colicin E7 immunity protein 
kinetics of folding, 694 
collagen 
axial ratio, 574-75 
frictional ratio, 575, 577 
helical polymer, 503-6 
4-hydroxyproline, 504 
interstrand hydrogen bond, 504 
intrinsic viscosity, 579 
posttranslational modification, 
122-23 
proline isomerization, 701 
triple helix, 504 
viscosity, 579 
collagen type VI 
domains, 386 
collagen type XII 
electron microscopy, 585, 587 
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collagen type XIV 
peptide map, 434 
collagenase 
crystallography, 181 
collisional quenching 
chymotrypsin, 603 
fluorescence, 602 
fructose-bisphosphate aldolase, 
603 
immunoglobulin G, 603 
colonic mucin 
oligosaccharides of glycoproteins, 
128, 130 
colony 
cloning of DNA, 99 
colorimetric assay, 18 
column chromatography, 4 
common ancestor 
evolution of proteins, 346 
common fold 
definition, 393 
complement fixation, 564 
complementarity-determining 
regions 
immunoglobulins, 558 
in immune complex, 559 
complementary DNA 
cloning of DNA, 98 
complementary faces 
quaternary structure, 455 
complex N-linked oligosaccharides 
oligosaccharides on glycoproteins, 
131 
complexes between dodecyl sulfate 
and polypeptides 
sieving, 429 
compressibility, isothermal 
of a molecule of protein, 278 
packing of side chains, 278 
thermodynamics of folding, 673 
water, 192 
compression 
electrophoresis of DNA, 106 
computational alignment 
aligning amino acid sequences, 351 
evaluation of, 368 
computed Fourier transform 
image reconstruction, 501 
concanavalin A 
B structure, 261 
interfaces, 480 
packing of side chains, 281 
posttranslational modification, 116 
concentration of protein 
measurement, 21 
osmotic pressure, 409 
sedimentation equilibrium, 412 
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concentration, units of, 196-99 
corrected volume fraction, 196 
equilibrium constant, 197 
partition coefficient, 198 
standard free energy of transfer, 198 
volume fraction, 196 

configurational entropy 
molten globule, 683 
thermodynamics of folding, 681-82 

configurational heat capacity 
water, 192 

conformational changes 
aspartate carbamoyltransferase, 577 
circular dichroism, 600 
during association of proteins with 

nucleic acid, 321 
fluorescence resonance energy 
transfer, 609 
immune complex, 561 
connections among nuclei 
nuclear magnetic resonance, 622 
conservation of charge 
acids and bases, 66 

conservation of mass 
acid and bases, 66 

conservative replacement 
aligning amino acid sequences, 349 
evolution of proteins, 349 

constraints 
refinement, 174-75 

continuous assay, 15 

continuous flow 
kinetics of folding, 694 

continuous wave nuclear magnetic 

spectrometers 
nuclear magnetic resonance, 614 
convergent evolution 
alanine dehydrogenase, 373 
aligning crystallographic molecular 
models, 372 
o-amino-acid oxidase, 373 
chorismate mutase, 372 
cytochrome P-450, 373 
3-dehydroquinate dehydratase, 
373 
ferredoxin, 373 
L-lactate dehydrogenase, 373 
L-lactate dehydrogenase 
(cytochrome), 373 
nitric-oxide synthase, 373 
superoxide dismutase, 373 
coordinate system 
crystallography, 157 

copper 
metalloproteins, 331 

coproporphyrinogen oxidase 
purification, 26 


core electrons, 56 
core of the hydrocarbon 
bilayer of phospholipid, 757 
corrected volume fraction 
concentration, units of, 196 
definition, 196 
correlated spectrum 
nuclear magnetic resonance, 619 
coulomb effect, 57 
coulomb’s law, 39 
countercurrent distribution, 5 
coupled assay, 16 
coupling constant 
nuclear magnetic resonance, 615 
covalency 
hydrogen bond, 215 
covalent bonds 
metal ion, 327 
covalent modification, 529-52 
acetic anhydride, 535 
acetylcholine receptor, 797 
actin, 547 
acylating agents, 535 
aldehydes, 534 
arginine, 539 
aryl azides, 541 
aryl nitrenes, 542 
aspartate, 539-41 
1-azidopyrene, 797 
aziridine, 537 
bacteriorhodopsin, 797 
benzyl bromide, 536 
brominating agents, 539 
bromopyruvate, 549 
carbenes, 543 
carbodiimides, 539-41 
carbonate dehydratase, 547 
N-chlorobenzenesulfonamide, 538 
chymotrypsin, 543 
chymotrypsinogen, 547 
citraconic anhydride, 536 
1,2-cyclohexanedione, 539 
cysteine, 530, 532, 536-37 
cytochrome De, 546, 550 
cytochrome-c oxidase, 797 
decomposition of reagent, 534 
2-dehydro-3-deoxy- 
6-phosphogluconate aldolase, 
549 
diazonium salts, 538 
5-diazonium-1-hydrotetrazole, 538 
diazotized p- (°S]sulfanilic acid, 
799 
dicyclohexyl carbodiimide, 539 
diethyl pyrocarbonate, 536 
5-(dimethylamino) naphthalene-1- 
sulfonyl fluoride, 536 


diphenylethanedione, 539 

5,5’-dithiobis(2-nitrobenzoate), 
537 

DNA topoisomerase, 546 

DNA-directed RNA polymerase, 
548 

electrophilic reagents, 529 

N-(ethoxycarbonyl) -2-ethoxy-1,2- 
dihydroquinoline, 541 

N-ethylmaleimide, 536-37 

N-ethyl-5-phenylisoxazolium-3’- 
sulfonate, 541 

N-ethyl-N’-[3-(dimethylamino) 
propylicarbodiimide, 539 

ferredoxin-NADP* reductase, 550 

fluorescent electrophiles, 606 

fluorosulfonic acids, 536 

N-formyl-[”S]sulfinylmethionyl 
methylphosphate, 799 

fructose-bisphosphate aldolase, 
547 

fumarase, 536 

y-glutamyltransferase, 546, 550 

glucose-6-phosphate isomerase, 
545 

glutamate, 539-41 

glyceraldehyde-3-phosphate 
dehydrogenase 
(phosphorylating), 542 

glycophorin, 797 

histidine, 530, 531, 536 

N-hydroxysuccinimide esters, 535 

IC], 538 

iminothiolane, 549 

impermeant reagents for, 798 

iodoacetamide, 530 

5-[!*°T]iodonaphthyl azide, 797 

isethionyl['*C] acetimidate, 799 

isocitrate lyase, 544 

isocyanates, 535 

isothiocyanates, 534 

kinetics, 531 

æ lactalbumin, 545 

lactoperoxidase, 538 

lysine, 530-36 

membrane-spanning o helices, 796 

methionine, 530, 532, 536 

methyl acetimidate, 532-34 

myosin, 544 

Na*/K*-exchanging ATPase, 546, 797 

nitrene, 542 

2-[(2-nitrophenyl) sulfenyl] - 
3-methyl-3’-bromoindolenine, 
539 

oxidative cleavage, 544 

4-(oxoacetyl) phenoxyacetic acid, 
539 


papain, 546, 551 
p-chloromercuribenzoate, 537 
pH effects, 531 
phosphoenolpyruvate 
carboxykinase (GTP), 543 
photolytic reactions, 541-44 
p-nitrophenylethanedione, 539 
purpose of, 529-30, 546-47 
A repressor, 547 
rhodopsin, 605 
ribonuclease, 550 
ribulose-bisphosphate 
carboxylase, 536, 546 
rose bengal, 536 
seminal ribonuclease, 545 
sigma factor rpoD, 548 
specificity of, 530-32 
succinic anhydride, 535 
sulfenyl halides, 538 
tetracycline repressor, 544 
3,4,5,6-tetrahydrophthalic 
anhydride, 536 
tetranitromethane, 538 
2-S-[!*C]thiuroniumethane- 
sulfonate, 799 
trifluoroacetic anhydride, 535 
3- (trifluoromethyl) -3-(m- 
['TJiodophenyl)diazirine, 797 
2,4,6-trinitrobenzenesulfonate, 
536 
1-tritiospiro[adamantane-4,3’- 
diazirine], 797 
tryptophan, 538-39 
tyrosine, 536, 537-38 
UDP-N-acetylglucosamine 
1-carboxyvinyltransferase, 546 
vanadate, 544 
2-vinylpyridine, 537 
yield, 530 
crambin 
p turn, 265 
creatinase 
aligning crystallographic molecular 
models, 362-63 
creatine kinase 
fluorescence resonance energy 
transfer, 608 
CRINEPT 
nuclear magnetic resonance, 621 
critical concentration 
assembly of microtubules, 724-25 
critical micelle concentration 
detergent, 769 
Cro protein 
interface, 478 
cross-link 
posttranslational modification, 119 


cross-linking, 439-46, 548 
actin, 549 
arrangement of the subunits, 445 
assembly of fibrin, 721 
assembly of oligomers, 445 
chaperonin GroEL, 444-45 
count of the number of subunits, 
441 
cysteine, 549 
detection of heterologous 
associations, 519 
dimethyl suberimidate, 440 
epidermal growth factor receptor, 
443 
glutaraldehyde, 443 
glycerol kinase, 442 
immunostaining, 566 
L-lactate dehydrogenase, 443-45 
ladders, 442 
lysines, 440 
membrane-spanning o helices, 803 
m-xylylene diisocyanate, 440 
myosin, 549 
Na‘/K*-exchanging ATPase, 444-45 
quantitative cross-linking, 443-45 
ribonuclease, 548 
ribosome, 549 
stoichiometric ratio of subunits, 
445 
succinate-CoA ligase (ADP- 
forming), 445 
cross-linking reagent 
2-(p-nitrophenyl)-3-(3-carboxy- 
4-nitrophenyl)thio-1-propene, 
548 
cross-linking reagents, 441 
cross-sectional area 
bilayer of phospholipid, 754 
crystal packing 
crystallography, 170 
molecular rotational axes of 
symmetry, 465 
crystallin 
aligning amino acid sequences, 361 
evolution of proteins, 350 
folding, 668-69 
hydrogen bonds in crystallographic 
molecular models, 308 
crystalline array 
cytochrome-c oxidase, 791 
crystallization of proteins, 49-50 
acetylcholine receptor, 772 
acylphosphatase, 49 
aquaporin, 772 
bacteriorhodopsin, 772, 775 
chytochrome o ubiquinol oxidase, 
772 
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crystallography, 50 
cytochrome-c oxidase, 772, 775 
endoplasmic reticulum Ca’*- 
transporting ATPase, 772 
ferrichrome-iron receptor, 772 
a-galactosidase, 49 
halorhodopsin, 772 
hanging drop, 50 
a-hemolysin, 772 
integral membrane-bound protein, 
775 
large-conductance mechano- 
sensitive channel, 772 
lipid A export ATP-binding protein, 
772 
maltoporin, 772 
nicotinate-nucleotide 
diphosphorylase, 49 
outer membrane protein A, 772 
outer membrane protein F, 772, 
775 
outer membrane protein TolC, 775 
outermembrane protein TolC, 772 
phosphoenolpyruvate 
carboxykinase, 49 
photosynthetic reaction center, 
772 
porin, 772 
potassium channel DcsA, 772 
protein MsbA, 772 
rhodopsin, 772 
succinate dehydrogenase, 772 
sucrose porin, 772 
ubiquinol-cytochrome-c 
reductase, 772, 775 
crystallization of integral membrane- 
bound proteins 
bicontinuous cubic phase, 775 
crystallographic asymmetric unit 
definition, 457 
space groups, 457 
crystallographic axis of symmetry 
definition, 461 
space groups, 461 
crystallographic molecular model, 
167-70 
a-carbon diagram, 167 
accessible surface area, 273 
aquaporin, 788 
bacteriorhodopsin, 777 
bilayer of phospholipid, 751-53 
cartoon, 167 
dilauroyl-N, N-dimethyl- 
phosphatidylethanolamine, 
753 
dilauroylphosphatidylethanol- 
amine, 753 
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dilauroylyphosphatidic acid, 753 

dimyristoylphosphatidylcholine, 
753 

dimyristoylphosphatidylglycerol, 
753 

endoplasmic reticulum Ca". 
transporting ATPase, 778 

ferrichrome-iron receptor, 783 

lysozyme, 170 

myoglobin, 170 

penicillopepsin, 167-70 

photosynthetic reaction center, 
780 

porin OmpF, 782 

potassium channel KcsA, 779 

random meander, 170 

skeletal representation, 167 

space-filling representation, 170 

ubiquinol-cytochrome-c 
reductase, 781 

crystallographic R-factor 

crystallography, 173 

definition, 173 

crystallography 

a helices, 165-67 

alternative conformations, 182 

amplitude of the reflection, 153 

amplitude of the structure factor, 
155 

anisotropic thermal parameters, 
176 

aspartyl endopeptidase, 182 

axes, 151 

Bragg spacings, 158 

p structure, 165-67 

B turns, 165-67 

B value, 175 

calculated amplitudes, 172 

calculated phases, 173 

charge-coupled device, 155 

citrate (si) synthase, 171 

collagenase, 181 

coordinate system, 157 

crystal packing, 170 

crystallization, 50 

crystallographic R-factor, 173 

cubic lattice, 151 

data set, 155 

deoxyribonuclease, 158, 182 

difference maps of electron 
density, 173 

diffraction, 149 

diffraction limit, 154 

dihydrofolate reductase, 180 


distribution of electron density, 156 


free R-factor, 176 
Friedel pair, 152 


fundamental unit cell, 151 
a-glucosidase, 183 
hexagonal lattice, 151 
hydrogen atoms, 182 
index, 151-53 
ions, 181 
lattice, 151 
layer line, 150 
lysozyme, 155 
molecular model, 163 
molecular replacement, 182 
monoclinic lattice, 151 
multiple isomorphous 
replacement, 158-61 
native structure, 171 
nitrite reductase, 181 
observed amplitudes, 173 
observed phases, 173 
oligosaccharides, 180 
orthorhombic lattice, 151 
phase of the reflection, 154 
photosynthetic reaction center, 
180-181 
B-phycoerythrin, 181 
reflecting faces, 150 
resolution, 158 
rhombohedral lattice, 151 
ribonuclease T}, 184 
secondary structure, 165-67 
solvent flattening, 161 
structure factor, 155 
synchrotron, 156 
tetragonal lattice, 151 
tube of electron density, 162 
unit cell, 150 
unrefined map of electron density, 
161 


CTP synthase 


domains, 380 


cubic expansion coefficient 


water, 193 


cubic lattice 


cyclic symmetry 
fluorescence resonance energy 
transfer, 608 
integral membrane-bound 
proteins, 787 
cyclin 
heterologous associations, 517 
yeast two-hybrid assay, 519 
cyclin-dependent kinase inhibitor 1 
yeast two-hybrid assay, 519 
cyclin-dependent protein kinase 2 
molecular taxonomy, 396 
cyclohexane monooxygenase 
growth on cyclohexane, 20 
1,2-cyclohexanedione 
reagent for covalent modification, 
539 
cyclosporin synthase 
assay, 14 
cystathionine -lyase 
molecular taxonomy, 396 
cystathionine ß-synthase 
domains, 378 
cystathionine y-synthase 
space groups, 464 
cysteic acid 
cysteine, 82 
cysteine 
acid dissociation constant, 75 
covalent modification, 530, 532, 
536-37 
cross-linking, 549 
cysteic acid, 82 
disulfide, 81-82 
electronic structure, 80-82 
oxidation levels, 80-82 
sulfenic acid, 81-82 
sulfinic acid, 81-82 
sulfonate, 81-82 
cystines 
acetylcholine receptor, 783 


membrane-spanning o helices, 783 


crystallography, 151 
cyanogen bromide 
affinity adsorption, 27 
cleavage of polypeptide, 87, 89 
cyclic-AMP dependent protein 
kinase 
domains, 382 
fluorescence resonance energy 
transfer, 608-609 
molecular taxonomy, 396 
radius of gyration, 581 
transitory heterologous 
associations, 513 
X-ray scattering, 582, 584 
cyclic point group, 466 


posttranslational modification, 122 
ribonuclease, 124 
stereochemistry of side chains, 271 
thermodynamics of folding, 681 
thioredoxin, 125 
thrombomodulin, 125 
tris(2-carboxyethyl) phosphine, 125 
ultraviolet absorption spectra, 

601 


cystines, formation of 


endoplasmic reticulum, 708 
glutathione, 708 

insulin-like growth factor, 708 
kinetics of folding, 708-9 
mixed disulfide, 708 


protein disulfide-isomerase, 125, 
708-09 
reduction potential, 709 
ribonuclease, 708 
thiol:disulfide interchange protein, 
708 
cytidine deaminase 
rotational axes of 
pseudosymmetry, 484 
cytochrome b; 
diffusion, 822 
embedded anchor, 774 
purification, 773 
cytochrome Daer 
covalent modification, 546, 550 
cytochrome Die 
water in crystallographic molecular 
models, 293 
cytochrome c 
æ helix, 256 
aligning amino acid sequences, 
346-49, 351-52, 354-55, 360 
aligning crystallographic molecular 
models, 364-65 
epitopes, 561 
fluorescence resonance energy 
transfer, 609 
folding, 685 
free energy of folding, 677 
frictional ratio, 426 
immune complex, 560 
kinetics of folding, 689, 692, 695, 
697-99 
molten globule, 683-84 
nuclear magnetic resonance, 
617 
proline isomerization, 702 
radius of gyration, 581 
sieving, 424, 428 
cytochrome-c oxidase 
bound phospholipid, 784 
boundary layer of phospholipid, 
786, 805 
circular dichroism, 598 
covalent modification from within 
the bilayer, 797 
crystalline array, 791 
crystallization, 772, 775 
fluorescence resonance energy 
transfer, 609 
heterooligomer, 777 
membrane-spanning o helices, 
772, 777, 784, 796 
passageway for cations, 778 
short subunits, 777 
topography of membrane- 
spanning proteins, 802 


cytochrome-c peroxidase 
peptide separation, 91 
cytochrome-c reductase 
translational diffusion coefficient, 
814 
cytochrome d 
kinetics of folding, 704 
cytochrome cı 
anchored membrane-bound 
proteins, 764 
circular dichroism, 598-99 
cytochrome c, 
kinetics of folding, 691 
nuclear magnetic resonance, 626 
cytochrome Cosi 
nuclear magnetic resonance, 638 
cytochrome d ubiquinol oxidase 
resonance Raman spectrum, 596 
cytochrome f 
aligning amino acid sequences, 360 
water in crystallographic molecular 
models, 293-94 
cytochrome o ubiquinol oxidase 
crystallization, 772 
membrane-spanning helices, 772 
passageway for cations, 778 
cytochrome P-450 
convergent evolution, 373 
cytoplasm 
protein concentration, 1 
cytoplasmic surface 
of a membrane, 743 
cytosine 
electronic structure, 65 
cytoskeleton, 820-21 
actin, 820 
ankyrin, 821 
band 3 anion transport protein, 
821 
erythrocyte, 820 
glycophorin, 821 
phosphatidylserine, 821 
protein 4.1, 821 
spectrin, 820 


D 
D-5-deamino-5(S)-hydroxyneur- 
aminic acid 
oligosaccharides of glycoproteins, 
128 
p-alanine-D-alanine ligase 
domains, 388 
molecular taxonomy, 397 
D-amino-acid oxidase 
convergent evolution, 373 
data set 
crystallography, 155 
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definition, 155 
databanks, searching 
coverage, 368 
error rate, 368 
N-deacetylheparin N-sulfotransferase 
purification, 29 
dead time 
kinetics of folding, 688 
stopped-flow apparatus, 688 
DEAE cellulose 
chromatography, 9 
decay in the anisotropy 
rotational diffusion coefficient, 812 
decyl B-p-glucoside 
detergent, 770 
decyl B-p-maltoside 
detergent, 770 
2-dehydro-3-deoxy-6-phospho- 
gluconate aldolase 
aligning amino acid sequence, 
360 
covalent modification, 549 
molar mass, 418 
peptide map, 437-38 
purification, 29 
quaternary structure, 407 
dehydromerohistidine, 126 
3-dehydroquinate dehydratase 
convergent evolution, 373 
domains, 380 
tetrahedral symmetry, 487 
3-dehydroquinate synthase 
domains, 380 
deleterious mutation 
evolution of proteins, 348 
deletion 
evolution of proteins, 350 
delocalization 
acids and bases, 63 
electronic structure, 56 
denaturant 
definition, 660 
denatured state 
definition, 659 
denaturing proteins 
when sequencing polypeptides, 
87 
ö-endotoxin CryIIIA 
packing of a helices, 285 
3-deoxy-7-phosphoheptulonate 
synthase 
purification, 25 
2’-deoxyadenosine, 95 
2’-deoxycytidine, 95 
2’-deoxyguanosine, 95 
deoxyhemoglobin 
molecular charge, 34 


854 Index 


deoxyribonuclease 
association of proteins with 
nucleic acid, 316 
crystallography, 158, 182 
endopeptidolytic cleavage, 547 
hydrogen bonds in 
crystallographic molecular 
models, 307-08 
multiple isomorphous 
replacement, 161 
Ramachandran plot, 255 
refinement, 176, 178 
secondary structure, 263 
space groups, 458 
water in crystallographic molecular 
models, 292 
dephospho-CoA kinase 
domains, 391 
deshielding 
hydrogen bond, 209 
desmin 
frictional ratio, 577 
sedimentation velocity, 577 
desmin filaments 
intermediate filaments, 506 
desmosine, 123 
detachable domains 
anchored membrane-bound 
proteins, 773 
domains, 376 
immunoglobulin G, 376, 378 
detergents 
alkyl glycosides, 769 
alkyl oligo(ethylene oxide) ethers, 
768 
CEs, 770 
CoE, 770 
CyoEg, 770 
Cube, 770 
Cy.Es, 770 
C14Eg, 770 
Coen, 770 
3-[(3-cholamidopropyl)dimethy- 
lammonio]-1-propane- 
sulfonate, 770 
decyl B-9-glucoside, 770 
decyl ß-9-maltoside, 770 
N,N-dimethyldecylamine N-oxide, 
770 
N,N-dimethyldodecylamine 
N-oxide, 769-70 
N,N-dimethyloctylamine N-oxide, 
770 
dodecyl ß-9-glucoside, 770 
dodecyl ß-9-maltoside, 770 
octyl B-9-glucoside, 770 
saponins, 769 


Tritons, 769 
Triton X-100, 770 
dethiobiotin synthase 
electrospray mass spectrometry, 417 
deuterium nuclear magnetic 
resonance spectroscopy 
bilayer of phospholipid, 757 
boundary layer of phospholipid, 
786 
4,6-di(bromomethyl)-3,7-dimethyl- 
1,5-diazabicyclo[3.3.0]octadiene- 
2,7-dione 
reagent for cross-linking, 549 
diacylglycerol kinase 
integral membrane-bound protein, 
766 
2,2-dialkylglycine decarboxylase 
(pyruvate) 
point group, 470-71 
diameter of the particles 
chromatography, 6 
diaminopimelate epimerase 
domains, 383 
diazonium salts 
reagents for covalent modification, 
538 
5-diazonium-1-hydrotetrazole 
reagent for covalent modification, 
538 
diazotized p-[”S]sulfanilic acid 
impermeant reagent for covalent 
modification, 799 
dicarboxylate transporter 
immunostaining, 566 
dicarboxylic acids 
intramolecular hydrogen bonds, 
227 
dicyclohexyl carbodiimide 
reagent for covalent modification, 
539 
2’,3’-dideoxynucleotide 
sequencing of DNA, 104 
dielectric relaxation 
hydration of a protein, 297 
hydrophobic effect, 233 
water, 195 
diethyl pyrocarbonate, 550 
reagent for covalent modification, 
536 
difference in pK, 
hydrogen bond, 210 
difference maps of electron density 
crystallography, 173 
difference maps of scattering density 
image reconstruction, 502 
differential centrifugation 
cell fractionation, 743 


differential scanning calorimetry 
domains, 388-89 
diffraction limit 
crystallography, 154 
diffraction of X-radiation 
biological membranes, 808 
crystallography, 149 
diffusion 
adenylate cyclase, 817 
B-adrenergic receptor, 817 
cytochrome bz, 822 
frictional coefficient, 577-78 
frictional ratio, 577-78 
in biological membranes, 811-13 
rhodopsin, 823 
rotational diffusion coefficient, 811 
translational diffusion coefficient, 
811 
diffusion coefficient 
definition, 37 
measurement, 37 
molten globule, 684 
standard, 574 
dihedral angle 
nuclear magnetic resonance, 616 
dihedral angle 7, 
stereochemistry of side chains, 267 
dihedral angle %2 
stereochemistry of side chains, 269 
dihedral angles d and y 
secondary structure, 252-54 
dihedral point group 
definition, 470 
dihedral point group 222, 470-72 
dihedral point group 322, 472-73 
dihedral point group 422, 472 
dihedral point group 522, 472 
dihedral symmetry 
gap junction connexon, 787 
dihydrodipicolinate reductase 
proton exchange, 645 
dihydrodipicolinate synthase 
point group, 475 
dihydrofolate reductase 
a helix, 257 
circular permutation, 680 
crystallography, 180 
domains, 380 
hydration, 299 
kinetics of folding, 689, 691-692, 
694, 698, 704 
molecular taxonomy, 395 
nuclear magnetic resonance, 621, 
622, 625 
purification, 29 
water in crystallographic molecular 
model, 299 


dihydrofolate reductase-thymidylate 
synthase 
domains, 390 
dihydrolipoyl dehydrogenase 
assembly of oligomers, 715 
domains, 388 
dihydrolipoyllysine-residue 
acetyltransferase 
aligning amino acid sequence, 360 
assembly of oligomers, 715 
domains, 384, 390 
folding, 683 
hydrogen bonds in 
crystallographic molecular 
models, 309 
interfaces, 479 
ionic interactions, 300 
kinetics of folding, 702 
quaternary structure, 490 
space groups, 463 
dihydrolipoyllysine-residue 
(2-methylpropanoyl)transferase 
quaternary structure, 490 
dihydrolipoyllysine-residue 
succinyltransferase 
aligning amino acid sequence, 
360 
mismatched symmetry, 511 
quaternary structure, 490 
space groups, 463 
dihydroneopterin aldolase 
interfaces, 480 
dihydroorotase 
domains, 379 
dilauroyl-N, N-dimethylphosphati- 
dylethanolamine 
crystallographic molecular model, 
753 
dilauroylyphosphatidic acid 
crystallographic molecular model, 
753 
dilauroylphosphatidylethanolamine 
crystallographic molecular model, 
753 
dimer 
fundamental unit of quaternary 
structure, 474 
dimer of dimers 
point group 222, 471 
dimers of water 
water, 190 
dimethyl suberimidate 
cross-linking, 440 
6,7-dimethyl-8-ribityllumazine 
synthase 
icosahedral symmetry, 488 
interfaces, 488 


5-(dimethylamino) naphthalene- 1- 
sulfonyl fluoride 
fluorescent reagent for covalent 
modification, 536 
N,N-dimethyldecylamine N-oxide 
detergent, 770 
N,N-dimethyldodecylamine N-oxide 
detergent, 769-770 
N,N-dimethyloctylamine N-oxide 
detergent, 770 
dimyristoylphosphatidylcholine 
crystallographic molecular model, 
753 
dimyristoylphosphatidylglycerol 
crystallographic molecular model, 
753 
dipeptidyl-peptidase IV 
purification, 773 
diphenylethanedione 
reagent for covalent modification, 
539 
diphosphatidylglycerol 
asymmetry of, 810 
phospholipid, 749 
diphtheria toxin 
punching a hole in a membrane, 804 
diphtheria toxin repressor 
dipolar interactions 
electron paramagnetic resonance, 
649 
metalloproteins, 332-33 
disc electrophoresis, 42-44 
disordered water 
water in crystallographic molecular 
models, 290 
distance between dipoles 
fluorescence resonance energy 
transfer, 605 
distance distribution function 
X-ray scattering, 581 
distance measurements 
fluorescence resonance energy 
transfer, 607-9 
distribution coefficient 
molecular exclusion, 12 
distribution of electron density 
crystallography, 156 
distribution of scattering density 
image reconstruction, 501 
B-dystroglycan 
heterologous associations, 513 
disulfide 
cysteine, 81-82 
disulfide interchange, 124 
5,5’-dithiobis(2-nitrobenzoate) 
reagent for covalent modification, 
537 
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dithiothreitol 
use, 124 
divalent metal ions 
ion pairs, 203 
DNA-(apurinic or apyrimidinic site) 
lyase 
metalloproteins, 330 
nucleic acid, association of 
proteins with, 320 
E2 DNA-binding domain 
rotational axes of symmetry, 
468 
DNA-directed DNA polymerase 
cloning of DNA, 97 
domains, 388 
fluorescence resonance energy 
transfer, 608-09 
posttranslational modification, 
115 
purification, 30 
DNA-directed RNA polymerase 
cloning of DNA, 97 
covalent modification, 548 
quaternary structure, 407 
DNA ligase (ATP) 
cloning of DNA, 96 
DNA ligase (NAD*) 
cloning of DNA, 96 
DNA polymerase D 
nucleic acid, association of 
proteins with, 315 
DNA topoisomerase 
covalent modification, 546 
dodecyl f-p-glucoside 
detergent, 770 
dodecyl ß-D-maltoside 
detergent, 770 
dodecyl sulfate gel electrophoresis 
carbonate dehydratase, 422 
catalase, 422 
catalogue of polypeptides, 431 
chymotrypsinogen, 422 
fructose-bisphosphate aldolase, 
422 
fumarate hydratase, 422 
glutamate dehydrogenase, 422 
glyceraldehyde-3-phosphate 
dehydrogenase, 422 
L-lactate dehydrogenase, 422 
micelle of dodecyl sulfate, 421 
myoglobin, 422 
ovalbumin, 422 
phosphorylase, 422 
retardation coefficient, 422 
serum albumin, 422 
stacking, 422 
unfolded polypeptides, 421 
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dolichyl-phosphate 
B-D-mannosyltransferase 

fluorescence resonance energy 
transfer, 609 

purification, 773 

domains, 376-91 

acetylcholine receptor, 779 

acyl carrier protein, 382 

[acyl-carrier-protein] 
S-acyltransferase, 381 

adenylate cyclase, 817 

aggrecan, 386 

d-alanine—o-alanine ligase, 388 

aminodeoxychorismate synthase, 
380 

anchored membrane-bound 
proteins, 764 

anion carrier, 377 

ankyrin domain, 386 

anthranilate synthase, 380 

antifreeze protein, 384 

AraC protein, 378 

armadillo domain, 386 


aspartate carbamoyltransferase, 379 


aspartate kinase, 378 

aspartate kinase-homoserine 
dehydrogenase, 381 

aspartate transaminase, 389 

aspartic endopeptidase, 384 

benzoate 4-monooxygenase, 382 


biotin-dependent carboxylase, 382, 


388 

btk kinase, 386 

C2 domain, 386 

Ca’*-transporting ATPase, 779 

CAD multienzyme complex, 379, 
390 

carbamoyl-phosphate synthase, 
383 

carbamoyl-phosphate synthase 
(ammonia), 379 

carbamoyl-phosphate synthase 
(glutamine hydrolysing), 379 

catalase, 388 

chaperone protein PapD, 390 

cholesterol oxidase, 382 

CLC-0 chloride channel, 389 

coenzymatic domain, 382 

cohesin domain, 386 

collagen VI, 386 

CTP synthase, 380 

cyclic AMP-dependent protein 
kinase, 382 

cystathionine ß-synthase, 378 


3-dehydroquinate dehydratase, 380 


3-dehydroquinate synthase, 380 
dephospho-CoA kinase, 391 


detachable domain, 376 

diaminopimelate epimerase, 383 

differential scanning calorimetry, 
388-89 

dihydrofolate reductase, 380 

dihydrofolate reductase- 
thymidylate synthase, 390 

dihydrolipoyl dehydrogenase, 388 

dihydrolipoyllysine-residue 
acetyltransferase, 384, 390 

dihydroorotase, 379 

DNA-directed DNA polymerase I, 
388 

dot matrix, 384 

dragline silk, 384 

EF hand, 386 

effect on folding, 680 

EGF domain, 386 

endopeptidolytic detachment, 377 

enoyl-[acyl-carrier-protein] 
reductase, 381 

enzymatic domain, 379-82 

epidermal growth factor receptor, 
815 

erythronolide synthase, 381 

evolutionarily shifting domains, 388 

exon shuffling, 386 

extracellular matrix, 386 

Fab fragments, 376 

Factor XII, 386 

fatty-acid synthase, 381 

ferredoxin-NADP* reductase, 
376-77, 388 

fibrinogen, 388-89, 717 

fibronectin domain, 386 

functional domain, 382 

fundamental units of protein 
structure, 390 

galactose oxidase, 382 

gelsolin, 384 

gene fusion, 381 

gene multiplication, 384 

genetic detachment, 377 

glucose oxidase, 382 

glutaminase, 379 

glutamine-fructose-6-phosphate 
transaminase (isomerizing), 
380 

glutamyl endopeptidase, 378 

glutathione reductase, 388-89 

glutathione synthase, 388 

glutathione-disulfide reductase, 382 

GMP synthase, 380 

granulin, 384 

hemagglutinin glycoprotein, 388 

hemocyanin, 384 

hemopexin domain, 386 


hexokinase, 384 

homoserine dehydrogenase, 378 

3-hydroxyacyl-[acyl-carrier- 
protein] dehydratase, 381 

imidazole glycerol phosphate 
synthase, 380, 383 

immunoglobulin domain, 386 

immunoglobulin G, 376, 378, 382, 
390, 477, 555 

independently folding domain, 389 

independently shifting domains, 
388 

indole-3-glycerol-phosphate 
synthase, 377, 384 

initiation factor IF3, 390 

internal duplication, 383 

internally repeating domain, 382 

kinetics of folding, 709 

kringle, 386, 390 

L-lactate dehydrogenase, 382-83 

laminin yl, 390 

leucine-rich repeat, 386 

lysozyme, 388 

mannose-6-phosphate receptor, 
385 

methionyl aminopeptidase, 383 

modular domain, 385 

mosaic eukaryotic protein, 385 

multienzyme complex, 379-82 

NADH peroxidase, 388 

nebulin, 384 

operon, 381 

ovotransferrin, 384 

3-oxoacyl-[acyl-carrier-protein] 
reductase, 381 

3-oxoacyl-[acyl-carrier-protein] 
synthase, 381 

p120 GTPase activator, 386 

pantetheine-phosphate 
adenylyltransferase, 391 

peptidylamidoglycolate lyase, 377 

peptidylglycine monooxygenase, 
377 

perlecan, 386 

6-phosphofructo-2-kinase/ 
fructose-2,6-bisphosphate 
2-phosphatase, 389 

phosphoglycerate kinase, 383 

phosphoinositide phospholipase 
Cöl, 386-87 

phospholipase Cy, 386 

1-(5-phosphoribosyl)-5-[(5-phos- 
phoribosylamino)methylidine 
amino] imidazole-4- 
carboxamide isomerase, 383 

phosphoribosylamine-glycine 
ligase, 388, 390 


phosphoribosylanthranilate 
isomerase, 377, 384 
phosphoribosylformylglycin- 
amidine cyclo-ligase, 390 
phosphoribosylformylglycin- 
amidine synthase, 380 
phosphoribosylglycinamide 
formyltransferase, 390 
3-phosphoshikimate 1-carboxy- 
vinyltransferase, 380, 382 
placental ribonuclease inhibitor, 
384 
plasminogen, 388-89 
pleckstrin domain, 386 
prepromagainin, 384 
protein-tyrosine kinase ZAP-70, 390 
protein-tyrosine-phosphatase, 381 
pyruvate kinase, 382-83 
pyruvate oxidase, 385 
recurring domain, 382 
regulatory kinases, 386 
retinol-binding protein, 384 
RNA recognition motif, 386 
SAND domain, 386 
separately unfolding domains, 
388-89 
serum albumin, 384 
sex-lethal protein, 390 
SH2 domain, 386 
SH3 domain, 386 
shikimate dehydrogenase, 380 
shikimate kinase, 380 
spectrin, 384-85, 390 
START domain, 386 
structural domain, 387 
sulfite oxidase, 377 
sulfite reductase, 382 
thermolysin, 389 
thioredoxin-disulfide reductase, 
388 
thiosulfate sulfurtransferase, 383 
thrombospondin I, 386 
thymidylate synthase, 380 
titin, 385 
triose-phosphate isomerase, 382 
UDP-glucose 6-dehydrogenase, 
384 
Donnan effect 
osmotic pressure, 410 
serum albumin, 419 
Donnan formalism 
chromatography, 10 
Donnan potential 
osmotic pressure, 411 
donor 
fluorescence resonance energy 
transfer, 604 


donors and acceptors of hydrogen 
bonds 
nucleic acid structure, 315 
dopamine-ß-monooxygenase 
anchored membrane-bound 
proteins, 764 
dot matrix 
aligning amino acid sequences, 
351-52 
domains, 384 
double-helical DNA 
local rotational axis, 467 
nucleic acid, association of 
proteins with, 314 
rotational axis of pseudosymmetry, 
467 
structure of DNA, 95-96 
double-helical hairpin of RNA 
nucleic acid structure, 322 
double-mutant cycle 
hydrogen bonds in crystallographic 
molecular models, 310 
doubly wound, parallel 8 sheet 
recurring structure, 373 
dpx molecular orbitals 
phosphate, 83 
DQF 
nuclear magnetic resonance, 621 
dragline silk 
a helix, 259 
domains, 384 
dry weight, measurement of 
molar mass, 419 
dynamics 
nuclear magnetic resonance, 623 
dynein 
microtubules, 730 
dystrophin 
heterologous associations, 513 
immunoadsorbent, 566 


E 
Edman degradation 
sequencing of polypeptides, 86 
EF hand, 386 
effective charge number 
electrophoresis, 39 
effective molarity 
intramolecular processes, 224 
effective sphere 
diffusion, 37 
efficiency of transfer 
fluorescence resonance energy 
transfer, 604 
EGF domain, 386 
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elastase 
aligning crystallographic molecular 
models, 362-63 
elastic scattering 
absorption of light, 593 
electrolyte 
osmotic pressure, 411 
electron density 
bilayer of phospholipid, 750 
electron diffraction 
bacteriorhodopsin, 791 
electron microscopy 
aspartate carbamoyltransferase, 587 
cell fractionation, 744 
coagulation Factor Va, 585 
collagen type XII, 585, 587 
fibrinogen, 585, 587 
fibulin, 586 
image reconstruction, 501 
inversion-specific glycoprotein, 586 
o-macroglobulin, 588 
myosin, 586 
negative stain, 585 
nidogen, 586 
phosphorylase kinase, 586-87 
ribosome, 588 
rotary shadowing, 585 
shape of a protein, 585-88 
viral protein coats, 588 
electron nuclear double resonance 
aminocyclopropane carboxylate 
oxidase, 650 
catalase, 650 
electron paramagnetic resonance, 
649-50 
nitric-oxide synthase, 649-50 
photosynthetic reaction center, 650 
ribonucleoside-diphosphate 
reductase, 650 
electron paramagnetic resonance, 
645-50 
amine dehydrogenase, 647 
aminocyclopropane carboxylate 
oxidase, 646 
bacteriorhodopsin, 794 
bilayer of phospholipid, 755-56 
boundary layer of phospholipid, 
785 
CDP-6-deoxy-L-threo-d-glycero-4- 
hexulose-3-dehydrase, 646 
dipolar interactions, 649 
electron nuclear double resonance, 
649-50 
formate C-acetyltransferase, 645, 
647-48 
g factor, 648 
hyperfine splitting, 647 
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membrane-spanning o helices, 
793-95 
myoglobin, 648-49 
nitric-oxide synthase, 649-50 
nitrogenase, 646 
organic radical, 645 
1-oxyl-2,2,5,5-tetramethylpyrrolin- 
3-yl group, 645 
ribonucleoside-diphosphate 
reductase, 645, 649 
spectrometer, 646 
spin quantum number, 646 
spin-spin coupling, 647 
electron paramagnetic resonance 
spectrometer, 646 
electron transfer flavoprotein 
peptide map, 435-36 
electronic energy levels 
absorption of light, 592-95 
electronic structure, 55-61 
acyl oxygen, 60 
adenine, 65 
alanine, 76 
arginine, 80 
aromatic nitrogen heterocycles, 60-61 
aromaticity, 60 
asparagine, 79 
aspartic acid, 79 
bonding electrons, 56 
bonding molecular orbital, 56 
carbonyl oxygen, 60 
cysteine, 80-82 
cytosine, 65 
delocalization, 56 
formal charge, 56 
glutamic acid, 79 
glutamine, 79 
glycine, 76 
guanidinium cation, 80 
guanine, 65 
histidine, 77 
imidazole, 78 
isoleucine, 76 
leucine, 76 
lone pairs of electrons, 56, 59 
lysine, 80 
methionine, 80 
nucleoside bases, 65 
phenylalanine, 76 
phosphate, 83 
m lone pair of electrons, 59 
z molecular orbitals, 56 
proline, 76 
pyridine, 60 
pyrrole, 61 
serine, 77 
o bonds, 59 


o lone pair of electrons, 59 
o-r stereochemical representation, 
56, 59 
o structure, 59 
sulfate, 82 
threonine, 77 
tryptophan, 76 
tyrosine, 77 
uracil, 65 
valence electrons, 56 
valine, 76 
electronically excited state 
absorption of light, 594 
electron-scattering density, map of 
image reconstruction, 793 
electrophilic reagents 
covalent modification, 529 
electrophoresis, 36-45 
[acyl-carrier-protein] 
S-malonyltransferase, 46 
aggregation, 46 
ßlactoglobulin, 42 
carbonate dehydratase II, 40 
detection of heterologous 
associations, 515 
effective charge number, 39 
electrophoretic field, 38 
electrotransfer, 565 
equation governing, 40 
free electrophoretic mobility, 38, 41 
fructose-bisphosphate aldolase, 41 
hemoglobin, 41, 45 
Henry’s function, 40 
immunoglobulin G, 42 
immunostaining, 566 
in 8M urea, 432 
ionic double layer, 38 
ionic strength, 39 
isocitrate dehydrogenase, 46 
moving boundary electrophoresis, 
41 
myoglobin, 42 
number of amino acids, estimation 
of, 427 
ovalbumin, 38, 41-42 
ovomucoid, 42 
pepsin, 42 
phosphomevalonate kinase, 46 
polyacrylamide gel, 41 
relative mobility, 429 
retardation coefficients, 41, 426 
ribonuclease, 45 
running gel, 44 
sequencing of DNA, 101-3 
serum albumin, 42, 45 
sieving, 426-30 
stable moving boundaries, 42-43 


stacking, 43 
stacking gel, 43 
stain for enzymatic activity, 46 
terminal velocity, 37 
trypsin, 41, 45 
electrophoresis of DNA 
compression, 106 
electrophoresis on gels of 
polyacrylamide cast in solutions 
of dodecyl sulfate, 421-23 
electrophoretic field, 38 
electrospray mass spectrometer 
mass spectrometry, 91 
electrospray mass spectrometry 
aldehyde dehydrogenase, 417 
dethiobiotin synthase, 417 
L1 metallo-ß lactamase, 417 
molar mass, 416-17 
rusticyanin, 417 
ubiquinol-cytochrome-c 
reductase, 417 
electrostatic repulsion 
bilayers of phospholipid, 754 
electrostatic work 
ionic interactions in 
crystallographic molecular 
models, 301 
electrostriction, 197 
electrotransfer 
electrophoresis, 565 
ellipsoid of revolution 
axial ratio, 574 
hydrodynamic particle, 574 
elongation 
assembly of microtubules, 723 
elongation factor Ts 
heterologous interfaces, 510 
elongation factor Tu 
heterologous interfaces, 510 
elution volume 
definition, 4 
embedded anchor 
anchored membrane-bound 
proteins, 773 
cytochrome b;, 774 
glycophorin A, 774 
HLA histocompatibility antigen, 774 
sucrose a-glucosidase/oligo-1,6- 
glucosidase, 774 
viral hemagglutinin, 774 
emission of light, 592-613 
fluorescence, 594-95 
phosphorescence, 595 
emission spectrum 
fluorescence, 595 
3’-end 
sequencing DNA, 95 


5’-end 
sequencing DNA, 95 
end-labeled fluorescent fragments 
sequencing of DNA, 104 
end-labeled fragments 
sequencing of DNA, 103 
end-labeling 
immunostaining, 567 
endo-1,4-ß-galactosidase 
sequencing oligosaccharides, 135 
endo-1,4-ß-xylanase 
nuclear magnetic resonance, 636 
endo-a-sialidase 
sequencing oligosaccharides, 135 
endoglycosidases 
sequencing oligosaccharides, 133 
£3 endonexin 
yeast two-hybrid assay, 519 
endopeptidase K 
metalloproteins, 328-29 
endopeptidases 
impermeant reagents, 800 
posttranslational modification, 
113 
sequencing polypeptides, 87-88 
topography of membrane- 
spanning proteins, 800 
endopeptidolytic analysis 
of proton exchange, 642 
endopeptidolytic cleavage 
chymotrypsinogen, 547 
deoxyribonuclease, 547 
endopeptidolytic detachment 
domains, 377 
endoplasmic reticulum, 743 
cystines, formation of, 708 
endoplasmic reticulum 
Ca’*-transporting ATPase 
aligning amino acid sequences, 364 
boundary layer of phospholipid, 
785 
crystallization, 772 
crystallographic molecular model, 
778 
domains, 779 
image reconstruction, 793 
membrane-spanning o helices, 772 
passageway for cations, 778 
translational diffusion coefficient, 
813 
endosomes, 743 
energy level 
molecular orbital, 56 
energy levels 
absorption of light, 592-95 
engrailed homeodomain 
kinetics of folding, 702 


enoyl-[acyl-carrier-protein] reductase 
domains, 381 
enrichment 
definition, 21 
enthalpy change 
hydrophobic effect, 233 
enthalpy of activation 
proline isomerization, 699 
enthalpy of folding 
thermodynamics of folding, 671 
enthalpy of formation 
hydrogen bond, 210-11 
enthalpy of fusion 
water, 190 
enthalpy of hydration 
ion, 200-201 
ion pair, 201 
enthalpy of vaporization 
water, 190 
entropy of approximation 
hydrogen bonds in crystallographic 
molecular models, 309 
intramolecular processes, 224 
entropy of formation 
hydrogen bond, 210 
entropy of hydration 
ion, 202 
entropy of mixing 
standard states, 196 
entropy of molecularity 
intramolecular processes, 225 
entropy of rotational restraint 
intramolecular processes, 224-26 
entropy of transfer 
hydrophobic effect, 231 
enzymatic activity 
assembly of oligomers, 713 
enzymatic domain 
domains, 379-82 
enzymatic method of sequencing 
DNA, 103-5 
epidermal growth factor 
nuclear magnetic resonance, 636 
epidermal growth factor receptor 
cross-linking, 443 
dimerization, 814-15 
domains, 815 
integral membrane-bound protein, 
766, 814 
quantitative cross-linking, 815 
epitope tagging, 567 
epitopes 
conformationally specific, 561-62 
cytochrome c, 561 
definition, 558 
lysozyme, 561 
micrococcal nuclease, 562 
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neuraminidase, 562 
on antigen, 558 
poliovirus, 561 
rhinovirus, 561 
sequence specific, 561-62 
Eps15 homology domain 
heterologous associations, 514 
equilibrium, 414 
equilibrium constant 
concentration, units of, 197 
equilibrium constant for folding, 662 
cold shock-like protein, 664 
measurement, 666 
ribonuclease, 687 
thermodynamics of folding, 664 
equivalence point 
immunoprecipitate, 564 
erythrocruorin 
aligning crystallographic molecular 
models, 369 
molecular axes of symmetry, 481 
erythrocyte 
cytoskeleton, 820 
erythronate-4-phosphate 
dehydrogenase 
molecular taxonomy, 396 
erythronolide synthase 
domains, 381 
erythropoetin 
oligosaccharides of glycoproteins, 
130 
N-(ethoxycarbonyl)- 2-ethoxy-1,2- 
dihydroquinoline 
reagent for covalent modification, 
541 
ethyl acetoacetate 
tautomers, 70 
N-ethyl[2,3-"*C,] maleimide, 550 
N-ethyl-5-phenylisoxazolium- 
3’-sulfonate 
reagent for covalent modification, 
541 
N-ethylmaleimide 
reagent for covalent modification, 
536-37 
N-ethyl-N’- [3-(dimethylamino) 
propyl] carbodiimide 
reagent for covalent modification, 
539, 547 
ETS-domain protein Elk-1 
association of proteins with 
nucleic acid, 316 
evolution 
of interface, 455-56, 469 
of quaternary structure, 455 
evolution of proteins 
alternative splicing, 350 
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appearance of new protein, 359 
calmodulin, 351 
conservative replacement, 349 
crystallin, 350 
deleterious mutation, 348 
deletion, 350 
fibrinogen, 359 
gene duplication, 358 
genetic drift, 348 
glycine, 349 
histone H4, 351 
hydropathy, 366 
insertion, 350 
introns, 350 
isoforms, 358 
L-lactate dehydrogenase, 359 
malate dehydrogenase, 359 
mutation probability, 349-50 
neutral replacement, 348 
orthologues, 358 
paralogues, 358 
point mutation, 350 
positive selection, 358 
protein phosphatase 2A, 351 
splicing of messenger RNA, 350 
start site, 350 
stop site, 350 
tolerance to replacement, 366 
o tubulin, 351 
ubiquitin, 351 
evolutionarily shifting domains, 388 
evolutionary distance 
aligning amino acid sequences, 
355, 358 
proton exchange, 643-44 
proton exchange, 643-44 
exact rotational axis of symmetry 
space groups, 461 
excimer 
fluorescence, 610 
excluded volume 
thermodynamics of folding, 681 
exo-0-2,3-sialidase 
sequencing oligosaccharides, 134 
exo-a-sialidase 
rotational axes of symmetry, 787 
sequencing oligosaccharides, 134 
exoglycosidases 
sequencing oligosaccharides, 134 
exon shuffling 
domains, 386 
exopeptidases 
sequencing of polypeptides, 91 
exotoxin A 
molecular charge, 34 


expressing DNA 
alkaline phosphatase promoter, 109 
5-aminolevulinate synthase, 108 
expression system, 108 
expression vector, 108 
Factor Xa, 109 
fusion proteins, 109 
B-galactosidase, 109 
glutathione transferase, 109 
inclusion bodies, 109 
lacZ promoter, 108 
renin, 109 
restriction site, 109 
T3 promoter, 108 
T7 promoter, 108 
tacll promoter, 109 
trc promoter, 109 
ubiquitin, 109 
expression 
glucose transporter, 776 
Na‘/K*-exchanging ATPase, 776 
opsin, 776 
unspecific monooxygenase, 776 
expression of DNA 
chinese hamster ovary cells, 110 
histidine tails, 110 
insect cells, 109 
mammalian cells, 109 
murine L cells, 110 
polyhedrosis virus, 109 
yeast, 109 
expression system 
expressing DNA, 108 
expression vector 
expressing DNA, 108 
extended polymers 
sieving, 428 
extracellular matrix 
domains, 386 
extracellular matrix protein COMP 
coiled coil of o helices, 283 
extracytoplasmic surface 
of amembrane, 743 
extrapolation 
free energy of folding, 673-74 


F 
Fab fragments 
domains, 376 
immunoglobulin G, 556 
univalence, 556 
F-actin capping protein 
assembly of actin, 730 
Factor VIII 
sequence of DNA, 106 
Factor IX 
nuclear magnetic resonance, 629 


Factor Xa 
expressing DNA, 109 
Factor XII 
domains, 386 
Factor D 
molecular taxonomy, 396 
family of domains 
molecular taxonomy, 396 
FASTA, 354 
evaluation of, 368 
fast-atom bombardment 
mass spectrometry, 92 
fatty acid-binding protein 
kinetics of folding, 694 
fatty-acid synthase 
domains, 381 
fatty-acid-binding protein 
B structure, 260 
water in crystallographic molecular 
models, 294 
Fc fragment 
immunoglobulin G, 556 
ferredoxin 
convergent evolution, 373 
space groups, 464 
ferredoxin-NADP* reductase 
covalent modification, 550 
domains, 376-77, 388 
molecular taxonomy, 393 
water in crystallographic molecular 
models, 293 
ferrichrome-iron receptor 
crystallization, 772 
crystallographic molecular model, 
783 
hydrophobic sheath, 779 
ferritin 
interfaces, 489-90 
octahedral symmetry, 489-90 
quaternary structure, 489-90 
tetrahedral symmetry, 489-90 
fibrillar collagen 
helical cable, 505 
fibrin 
assembly of helical polymers, 
717-21 
rotational axes of symmetry, 719 
fibrin monomer 
fibrinogen, 717 
fibrinogen 
a-helical coiled coil, 282, 717 
diffusion coefficient, 577-78 
domains, 388-89, 717 
electron microscopy, 585, 587 
evolution of proteins, 359 
fibrin monomer, 717 
fibrinopeptides, 717 


frictional coefficient, 577-78, 588 
frictional ratio, 577-78 
(«ßY; heterohexamer, 717 
immunoelectron microscopy, 567 
mean molar mass of an amino 
acid, 418 
sedimentation coefficient, 577-78 
structure of, 717-20 
fibrinopeptides 
aligning amino acid sequences, 
349 
fibrinogen, 717 
fibronectin 
frictional ratio, 577 
X-ray scattering, 583 
fibronectin domain, 386 
fibulin 
electron microscopy, 586 
filaggrin 
amino acid sequence, 108 
fixed orientation 
fluoresence resonance energy 
transfer, 607 
flagellin 
image reconstruction, 502 
flavodoxin 
molecular taxonomy, 395 
recurring structure, 373 
flow rate 
chromatography, 6 
fluid mosaic, 807-24 
fluorescence, 601-10 
absorption spectrum, 595 
collisional quencher, 602 
emission of light, 594-95 
emission spectrum, 595 
excimer, 610 
kinetics of folding, 689 
lifetime of, 602 
lysozyme, 601, 603 
molten globule, 684 
phosphoglycerate kinase, 603 
quantum yield, 602 
quenching of, 602 
ribonuclease, 601 
ribonuclease T,, 603 
tryptophan, 601 
fluorescence resonance energy 
transfer, 603-10 
acceptor, 604 
acetylcholine receptor, 608 
actin, 609 
aspartate carbamoyltransferase, 
608 
ATP-dependent DNA helicase Rep, 
610 
calmodulin, 608 


chloramphenicol O- 
acetyltransferase, 608 

conformational changes, 609 

creatine kinase, 608 

cyclic AMP-dependent protein 
kinase, 608-9 

cyclic-AMP dependent protein 
kinase, 608 

cytochrome c, 609 

cytochrome-c oxidase, 609 

distance between dipoles, 605 

distance measurements, 607-9 

DNA-directed DNA polymerase, 
608-9 

dolichyl-phosphate ß-9- 
mannosyltransferase, 609 

donor, 604 

efficiency of transfer, 604 

fixed orientation, 607 

fluorescent electrophiles, 606 

GTP-binding protein Cdc42, 609 

high mobility group protein Z, 610 

lifetime of the excited state, 605 

lysozyme, 609 

myosin, 609 

Na‘/K*-exchanging ATPase, 609 

orientation factor, 606-7 

orientational freedom, 607 

overlap integral, 606 

pancreatic trypsin inhibitor, 608 

photo-lyase, 607 

RecA protein, 609 

rhodopsin, 605 

Rho-GDP dissociation inhibitor, 609 

steroid A-isomerase, 607 

transcription factor AP-1, 609 

troponin, 609 


fluorescent electrophiles 


covalent modification, 606 
fluoresence resonance energy 
transfer, 606 


fluorosulfonic acids 


reagents for covalent modification, 
536 


fold of the symmetry 


axes of symmetry, 452 


folded state, 659 
folding 


approximation, 681 
bovine pancreatic trypsin 
inhibitor, 685 
carbonate dehydratase, 670 
carboxypeptidase C, 679 
chaperone, 705-8 
chaperonin 60, 705-8 
crystallin, 668-69 
cytochrome c, 685 
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definition, 659 
dihydrolipoyllysine-residue 
acetyltransferase, 683 
domains, effect on, 680 
equilibrium constant for, 662 
heat shock proteins 70, 705, 707 
hydrogen-bonds, contributions to, 
680-81 
intermediate states, 668 
a-lactalbumin, 669, 685 
lysozyme, 678 
micrococcal nuclease, 678-79 
nucleation model, 685 
3-oxoacid CoA-transferase, 679 
penicillin amidase, 679 
pH, effect on, 662 
phosphoribosylanthranilate 
isomerase, 668, 679 
reversibility, 663 
ribonuclease, 668, 678-79, 685 
ribonuclease H, 678 
ribonuclease T}, 663 
scanning calorimetry, 664 
subtilisin, 679 
temperature effect on, 662 
tryptophan synthase, 685 
WW domains, 683 
folding of a polypeptide 
isomerization, 660 
footprinting, 548 
formal charge 
electronic structure, 56 
formate C-acetyltransferase 
electron paramagnetic resonance, 
645, 647-48 
formate dehydrogenase 
molecular rotational axes of 
symmetry, 465 
formate-tetrahydrofolate ligase 
purification, 26 
N-formyl- Kc sulfinylmethionyl 
methylphosphate 
covalent modification, 799 
impermeant reagent for covalent 
modification, 799 
5-formyltetrahydrofolate cyclo-ligase 
purification, 28-9 
forward scattering 
scattering of electromagnetic 
radiation, 579 
Fourier transform nuclear magnetic 
resonance spectrometer 
nuclear magnetic resonance, 614-15 
Fourier-Bessel transform 
image reconstruction, 502 
fractionation factor 
hydrogen bond, 209 
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hydrogen bonds in crystallographic 
molecular models, 312 
fragment ions 
mass spectrometry, 93 
fragmin 
assembly of actin, 730 
frameshift 
sequence of DNA, 106 
free electrophoretic mobility 
definition, 38 
dodecyl sulfate gel electrophoresis, 
421 
electrophoresis, 41 
free energies of association 
interfaces, 471 
free energies of formation 
hydrogen bonds in 
crystallographic molecular 
models, 310 
free energies of solvation 
hydrophobic effect, 235 
free energy of folding 
chymotrypsin, 674, 677 
chymotrypsin inhibitor, 677 
cold shock protein CspB, 674 
cytochrome c, 677 
extrapolation, 673-74 
immunoglobulin G, 675 
lysozyme, 677 
myoglobin, 677 
pH effect on, 675 
protein L9, 675 
proton exchange, 675 
ribonuclease, 676 
ribonuclease H, 675-677 
ribonuclease "Ty, 675, 677 
site-directed mutation effect on, 675 
thermodynamics of folding, 673 
thioredoxin, 675, 677 
free energy of transfer 
guanidinium, 660 
hydropathy side chains, 274 
hydrophobic effect, 231-38 
tryptophan, 276 
tyrosine, 276 
urea, 660 
free induction decay 
nuclear magnetic resonance, 615 
free R-factor 
crystallography, 176 
free-flow electrophoresis 
cell fractionation, 744 
French pressure cell, 20 
frequency labeling 
nuclear magnetic resonance, 617 
frequency of maximum absorption 
nuclear magnetic resonance, 614 


frictional coefficient, 577-78 
alcohol dehydrogenase, 578 
catalase, 578 
diffusion, 577-78 
diffusion coefficient, 37 
fructose-bisphosphate aldolase, 578 
B-galactosidase, 578 
hydration of a protein, 299 
hydrodynamic particle, 574 
in membranes, 812 
lysozyme, 578 
manganese-stabilizing protein, 578 
minimal, 574 
prothrombin, 578 
sedimentation velocity, 576-78 
serum albumin, 578 
string of spherical beads, 576 

frictional ratio, 574 
alcohol dehydrogenase, 578 
apoferritin, 426 
apolipoprotein(a), 577-78 
aspartate carbamoyltransferase, 577 
caldesmon, 577 
catalase, 578 
chymotrypsinogen, 426 
collagen, 575, 577 
cytochrome c, 426 
desmin, 577 
diffusion, 577-78 
fibrinogen, 577-78 
fibronectin, 577 
fructose-bisphosphate aldolase, 578 
B-galactosidase, 578 
glyceraldehyde-3-phosphate 

dehydrogenase 
(phosphorylating), 426 
L-lactate dehydrogenase, 426 
lysozyme, 578 
manganese-stabilizing protein, 578 
myoglobin, 426 
ovalbumin, 426 
plasminogen, 577 
polynucleotide 3’-phosphatase/ 
5’-kinase, 577 
prothrombin, 578 
sedimentation velocity, 577-78 
serum albumin, 578 
sieving, 425 
urease, 426 
vinculin, 577 

Friedel pair 
crystallography, 152 
definition, 152 

fructose 1,6-bisphosphatase 
purification, 26 

fructose-bisphosphate aldolase 
assembly of oligomers, 713 


collisional quenching, 603 
covalent modification, 547 
diffusion coefficient, 578 
dodecyl sulfate gel electrophoresis, 
422 

electrophoresis, 41 
frictional coefficient, 578 
frictional ratio, 578 
heterooligomers, 508 
isoforms, 439-40 
molar mass, 418, 419 
molecular charge, 36 
quaternary structure, 439-40 
sedimentation coefficient, 578 
sieving, 424, 427-28 

L-fucose 
structure, 129 

a-L-fucosidase 
sequencing oligosaccharides, 134 

L-fuculose-phosphate aldolase 
point group, 469 

fumarate hydratase 
assay, 13, 19 
assembly of oligomers, 713 
covalent modification, 536 
dodecyl sulfate gel electrophoresis, 

422 

interfaces, 480 
purification, 3 
sieving, 424 

functional domain 
domains, 382 

fundamental unit cell 
crystallography, 151 

fusion proteins 
expressing DNA, 109 


G 
yturn 
secondary structure, 264 
galactonate dehydratase 
assay, 19 
D-galactose 
structure, 129 
galactose oxidase 
domains, 382 
map of electron density, 165 
posttranslational modification, 122 
a-galactosidase 
crystallization, 49 
B-galactosidase 
diffusion coefficient, 578 
expressing DNA, 109 
frictional coefficient, 578 
frictional ratio, 578 
sedimentation coefficient, 578 
sequencing oligosaccharides, 134 


sieving, 424 
yeast two-hybrid assay, 518 
galaside 
glycosphingolipid, 748 
ganglioside 
glycosphingolipid, 748 
gap junction connexon 
dihedral symmetry, 787 
rotational axes of symmetry, 787 
gap penalty 
aligning amino acid sequences, 353 
gap percentage 
aligning amino acid sequences, 351 
gap-junction channel 
image reconstruction, 793 
gaps 
aligning amino acid sequences, 
350-51 
aligning crystallographic molecular 
models, 364 
gas-liquid chromatography, 4 
gelsolin 
assembly of actin, 730 
domains, 384 
heterologous associations, 516 
gene duplication 
evolution of proteins, 358 
gene fusion 
domains, 381 
evolution of proteins, 373-74 
gene multiplication 
domains, 384 
general control protein GCN4 
coiled coil of o helices, 283-84 
interfaces, 480 
space groups, 464 
genetic code, 98 
aligning amino acid sequences, 349 
genetic detachment 
domains, 377 
genetic drift 
evolution of proteins, 348 
genomic sequences 
sequence of DNA, 108 
geodesic domes 
icosahedral symmetry, 496 
geranylgeranylation 
posttranslational modification, 117 
geranyltranstransferase 
assay, 14 
g factor 
electron paramagnetic resonance, 
648 
y-glutamyltransferase 
covalent modification, 546, 550 
glial filaments 
intermediate filaments, 506 


y-linolenic acid, 747 
global rotational axis of symmetry 
definition, 491 
icosahedral symmetry, 491 
quasi-equivalence, 491 
globins 
aligning amino acid sequences, 356 
aligning crystallographic molecular 
models, 369-72 
globoside 
glycosphingolipid, 748 
glucan 1,4-a-glucosidase 
molecular taxonomy, 396 
4-a-glucanotransferase 
aligning crystallographic molecular 
models, 362 
interface in, 478 
glucarate dehydratase 
aligning amino acid sequences, 361 
D-glucose 
structure, 129 
glucose oxidase 
domains, 382 
interfaces, 480 
glucose transporter 
expression, 776 
glucose-6-phosphate isomerase 
covalent modification, 545 
interfaces, 481 
peptide map, 435, 437-38 
glucose-fructose oxidoreductase 
interfaces, 480 
a-glucosidase 
crystallography, 183 
D-glucuronic acid 
structure, 129 
glutamate 
acid dissociation constant, 75 
covalent modification, 539-41 
stereochemistry of side chains, 270 
glutamate dehydrogenase, 418 
dodecyl sulfate gel electrophoresis, 
422 
glutamate-ammonia ligase 
molar mass, 420 
posttranslational modification, 125 
quaternary structure, 475 
glutamate-tRNA ligase 
peptide map, 435 
glutamic acid 
electronic structure, 79 
water in crystallographic molecular 
models, 296 
glutaminase 
domains, 379 
glutamine 
acid dissociation constant, 75 
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electronic structure, 79 
hydropathy, 276 
stereochemistry of side chains, 270 
water in crystallographic molecular 
models, 296 
glutamine amidotransferase 
dodecyl sulfate gel electrophoresis, 
432 
glutamine y-glutamyltransferase 
topography of membrane- 
spanning proteins, 799 
glutamine-fructose-6-phosphate 
transaminase (isomerizing) 
domains, 380 
glutamine-pyruvate transaminase 
assay, 19 
glutamyl endopeptidase 
cleavage of polypeptide, 88 
domains, 378 
glutamyl-tRNA reductase 
purification, 31 
glutaraldehyde 
cross-linking, 443 
glutaredoxin 2 
nuclear magnetic resonance, 630 
glutathione 
cystines, formation of, 708 
function, 122 
structure, 122 
glutathione-disulfide reductase 
domains, 382, 388-89 
molecular taxonomy, 394-95 
point group, 467 
stereochemistry of side chains, 
267 
glutathione peroxidase 
space groups, 463 
glutathione synthase 
aligning crystallographic molecular 
models, 362 
domains, 388 
molecular taxonomy, 397 
rotational axes of symmetry, 483 
glutathione transferase 
detection of heterologous 
associations, 515 
expressing DNA, 109 
interfaces, 479 
glyceraldehyde-3-phosphate 
dehydrogenase 
(phosphorylating) 
accessible surface area, 273 
aligning crystallographic molecular 
models, 366 
assay, 17 
circular dichroism, 600 
covalent modification, 542 
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dodecyl sulfate gel electrophoresis, 
422 
frictional ratio, 426 
molar mass, 418 
molecular rotational axes of 
symmetry, 464 
molecular taxonomy, 395-96 
purification, 24, 48 
sieving, 424 
space groups, 463 
glycerate dehydrogenase 
molecular taxonomy, 396 
quaternary structure, 483 
glycerol kinase 
cross-linking, 442 
glycine 
electronic structure, 76 
evolution of proteins, 349 
glycine hydroxymethyltransferase 
circular dichroism, 598 
glycoforms 
oligosaccharides of glycoproteins, 
130 
glycolipids 
lipopolysaccharide, 749 
vectorial insertion into 
membranes, 767 
glycopeptides 
chromatography, 133 
oligosaccharides on glycoproteins, 
133 
glycophorin 
covalent modification from within 
the bilayer, 797 
cytoskeleton, 821 
embedded anchor, 774 
membrane-bound protein, 767 
oligomeric interfaces, 790 
glycoprotein 
definition, 127 
glycoproteins 
vectorial insertion into 
membranes, 767 
N-glycosidic linkage 
oligosaccharides of glycoproteins, 
128 
O-glycosidic linkage 
oligosaccharides of glycoproteins, 
128 
glycosphingolipids, 748 
arthroside, 748 
cerebroside, 748 
galaside, 748 
ganglioside, 748 
globoside, 748 
in rafts, 811 
isogloboside, 748 


lactoside, 748 
molluside, 748 
mucoside, 748 
neolactoside, 748 
schistoside, 748 
glycosylase MutY 
metalloproteins, 330 
glycosylation 
immune complex, 558 
topography of membrane- 
spanning proteins, 800 
glycosylphosphatidylinositol (GPD 
anchor 
posttranslational modification, 118 
glycosylphosphatidylinositol 
diacylglycerol-lyase 
glycosylphosphatidylinositol- 
linked proteins, 765 
glycosylphosphatidylinositol-linked 
proteins 
acetylcholinesterase, 765 
anchored membrane-bound 
proteins, 765 
carbonate dehydratase, 765 
cell surface glycoprotein a-2, 765 
glycosylphosphatidylinositol 
diacylglycerol-lyase, 765 
receptor for the Fc domain of 
immunoglobulinG, 765 
variant surface glycoprotein, 765 
GMP synthase 
domains, 380 
Golgi membranes, 743 
cell fractionation, 744 
synthesis of glycoproteins, 767 
good solvent 
definition, 659 
Gouy-Chapman equation, 754 
G-protein 
adenylate cyclase system, 817 
hydrolysis of GTP, 817 
subunits, 817 
gradient chromatography, 7 
gradient of concentration 
sedimentation equilibrium, 411 
granulin 
domains, 384 
granulocyte-colony-stimulating 
factor 
molecular taxonomy, 399 
growth hormone 
molecular taxonomy, 399 
growth hormone receptor 
asymmetric complex, 816 
dimerization, 816 
integral membrane-bound protein, 
816 


GTP 
assembly of microtubules, 726-29 
GTP-binding protein Cdc42 
fluorescence resonance energy 
transfer, 609 
guanidinium 
denaturant, 660 
free energies of transfer, 660 
preferential solvation, 22, 661 
guanidinium cation 
electronic structure, 80 
guanine 
electronic structure, 65 
guanine and arginine 
hydrogen bonding, 316 
guanine nucleotide-binding protein 
heterologous associations, 517 
guanylate kinase 
aligning crystallographic molecular 
models, 364 
Guinier plot 
X-ray scattering, 581 


H 
H*/K*-exchanging ATPase 
aligning amino acid sequences, 364 
topography of membrane- 
spanning proteins, 802-3 
H*-exchanging ATPase 
topography of membrane- 
spanning proteins, 802-3 
H*-transporting two-sector ATPase 
aligning amino acid sequence, 
360 
halocyanin 
resonance Raman spectrum, 596 
halorhodopsin 
crystallization, 772 
membrane-spanning helices, 772 
hanging drop 
crystallization, 50 
hapten, 563 
antigen, 562 
haptoglobulin 
aligning amino acid sequence, 360 
hardness 
metal ion, 327 
harsh treatments 
produce heterogeneity, 49 
head group 
phospholipid, 747 
heat capacity 
hydrophobic effect, 231 
kinetics of folding, 703 
water, 191 
heat capacity change of folding 
thermodynamics of folding, 671 


heat of fusion 
bilayer of phospholipid, 755 
heat shock protein 16.5 
octahedral symmetry, 487-88 
heat shock protein 70 
folding, 705, 707 
heat-labile enterotoxin 
molecular rotational axes of 
symmetry, 465 
point group, 469 
heavy atom 
multiple isomorphous 
replacement, 158 
hedgehog protein 
posttranslational modification, 115 
helical cables 
fibrillar collagen, 505 
intermediate filament, 506 
keratin, 508 
of helical polymers, 502 
helical nets 
packing of side chains, 280 
helical polymers, 499-508 
axes of symmetry, 452 
collagen, 503-6 
helical cables of, 502 
image reconstruction, 501-3 
microtubule, 506 
tRNA-intron endonuclease, 455 
helical surface lattice, 499 
CA protein, 500 
designation of, 499 
flagellin, 499 
microtubule, 722 
radial angle, 500 
T4 bacteriophage, 500 
thick filament of myosin, 731-32 
tobacco mosaic virus, 499-500 
helical wheel 
a helix, 258-59 
helix 
geometric parameters, 499 
310 helix 
p turn, 262 
hemagglutinin glycoprotein 
domains, 388 
heme 
metalloproteins, 330 
heme-binding protein 23 
interfaces, 480 
hemocyanin 
domains, 384 
posttranslational modification, 
122 
resonance Raman spectrum, 596 
rotational axes of 
pseudosymmetry, 485 


hemoglobin 
æ helix, 259 
aligning amino acid sequences, 360 
aligning crystallographic molecular 
models, 369-72 
electrophoresis, 41, 45 
heterooligomers, 508 
hydration of a protein, 298 
infrared spectrum, 595 
interfaces, 471-72 
mean molar mass of an amino 
acid, 418 
molecular taxonomy, 394, 398 
osmotic pressure, 420 
peptide map, 432-33 
quaternary structure, 407, 451 
sieving, 428 
stereochemistry of side chains, 267 
a-hemolysin 
B barrel, 776 
crystallization, 772 
punching a hole in a membrane, 803 
hemopexin domain, 386 
Henry’s function 
electrophoresis, 40 
heptad repeat 
coiled coil of a helices, 282 
intermediate filaments, 506 
heterocycle 
tautomers, 73 
heterocyclic side chain 
hydropathy, 242 
(«ßY, heterohexamer 
fibrinogen, 717 
heterologous association 
definition, 508 
heterologous associations 
a actinin, 513, 516 
ankyrin, 516 
annexin II, 514 
CD40 tumor necrosis factor 
receptor, 514 
cyclin, 517 
detection, 515 
B-dystroglycan, 513 
dystrophin, 513 
E-cadherin, 516 
Eps15 homology domain, 514 
gelsolin, 516 
guanine nucleotide-binding 
protein, 517 
heterooligomers, 508 
histocompatibility antigen, 513, 517 
importin, 514 
integrin, 516 
interfaces, 513 
laminin, 516 
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modular domains, 513 
myosin light chain kinase, 517 
nuclear import factor karyopherin 
a, 514 
nuclear localization signals, 514 
nucleolin, 516 
nucleoporin, 517 
PDZ domains, 514 
protein p11, 514 
protein-tyrosine phosphatase, 517 
proto-oncogene protein c-fos, 519 
proto-oncogene protein-tyrosine 
kinase ABL1, 517 
ribulose-bisphosphate 
carboxylase, 510 
SHC transforming protein, 517 
somatotropin, 519 
somatotropin receptor, 519 
synaptotagmin, 516 
T-cell receptor, 513 
a-thrombin, 513 
thrombomodulin, 513 
titin, 513 
transcription factor AP-1, 519 
transcription initiation factor 
TFIID, 517 
transitory, 513 
troponin C, 514 
troponin I, 514 
tumor necrosis factor receptor- 
associated factor 2, 514 
vitronectin, 516 
heterologous interface 
definition, 508 
heterologous interfaces 
aspartate carbamoyltransferase, 
508-10 
charged side chains, 513 
elongation factor Ts, 510 
elongation factor Tu, 510 
heterooligomers, 508 
immunoglobulin G, 513 
interleukin-1, 513 
interleukin-1 receptor, 513 
karyopherin £2, 513 
protein G, 513 
ribonuclease, 513 
ribonuclease inhibitor, 513 
ribulose-bisphosphate 
carboxylase, 510 
synaptobrevin-II, 513 
syntaxin-1A, 513 
heteromultimeric protein, 451 
heterooligomers, 508-19 
aspartate carbamoyltransferase, 
508-10 
assembly of oligomers, 713-17 
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fructose-bisphosphate aldolase, 508 
heterologous association, 508 
heterologous interface, 508 
histocompatibility antigen, 512 
homologous subunits, 508 
laminin, 513 
mismatched symmetry, 511 
modular domains in, 513 
multicatalytic endopeptidase 
complex, 508 
nidogen, 513 
nonstoichiometric ratios of 
subunits, 511 
pseudosymmetry, 508 
steric exclusion, 510 
hexagonal lattice 
crystallography, 151 
hexamers of dimers 
tetrahedral symmetry, 487 
hexokinase 
aligning crystallographic molecular 
models, 363 
axes of symmetry, 454 
domains, 384 
molecular taxonomy, 395 
point group, 467 
purification, 29 
sieving, 427 
hierarchical classification 
molecular taxonomy, 393 
high density lipoprotein 
lipoproteins, 805 
high mobility group protein Z 
fluorescence resonance energy 
transfer, 610 
highest occupied molecular orbital, 58 
high-mannose oligosaccharides 
oligosaccharides on glycoproteins, 
130 
high-pressure liquid 
chromatography, 6 
high-resolution mass spectrum 
posttranslational modification, 119 
histidine 
acid dissociation constant, 75 
covalent modification, 530-31, 536 
electronic structure, 77 
microscopic dissociation 
constants, 79 
nuclear magnetic resonance, 635 
tautomers, 78 
titration curve, 79 
histidine ammonia-lyase 
interfaces, 480 
posttranslational modification, 126 
histidine decarboxylase 
posttranslational modification, 114 


quaternary structure, 475 
subunits, 436 
histidine tails 
expression of DNA, 110 
histidinol-phosphate transaminase 
binding of ligand, 47 
histocompatibility antigen 
aligning amino acid sequences, 360 
heterologous associations, 513, 517 
heterooligomers, 512 
hydrogen bonds in crystallographic 
molecular models, 306 
packing of side chains, 279 
histone 
association of proteins with 
nucleic acid, 316, 320 
histone H4 
evolution of proteins, 351 
HIV-1 retropepsin 
assembly of oligomers, 713 
HLA histocompatibility antigen 
embedded anchor, 774 
purification, 773 
HLA-linked B-cell antigen 
purification, 773 
HMBC 
nuclear magnetic resonance, 621 
HMQC 
nuclear magnetic resonance, 621 
HNRNP arginine methyltransferase 
quaternary structure, 476 
Hofmeister series, 22 
HOHAHA 
nuclear magnetic resonance, 621 
homeodomain protein MATa2 
association of proteins with 
nucleic acid, 320 
homogenization, 1, 20 
homologous subunits 
heterooligomers, 508 
homologues 
aligning amino acid sequences, 346 
homooligomeric proteins 
frequency, 466 
homoserine dehydrogenase 
domains, 378 
HSMQC 
nuclear magnetic resonance, 621 
HSQC 
nuclear magnetic resonance, 621 
Hurler corrective factor 
assay, 19 
hybrid oligomers, 439 
hybridization 
acids and bases, 63 
hybridization of DNA 
cloning of DNA, 100 


hydrated effective sphere 
definition, 574 
hydration, 577-78 
bilayer of phospholipid, 751 
definition, 189 
X-ray scattering, 583 
hydration of a protein 
accessible surface area, 299 
chymotrypsin, 298 
chymotrypsinogen, 298 
dielectric relaxation, 297 
dihydrofolate reductase, 299 
frictional coefficient, 299 
hemoglobin, 298 
heterogeneity, 299 
a-lactalbumin, 298 
B-lactoglobulin, 298 
lysozyme, 298-99 
myoglobin, 298 
oligomeric proteins, 577 
ovalbumin, 298 
pepsin, 298 
preferential solvation, 297 
quantification, 296-300 
ribonuclease, 298-99 
scattering at small angles, 299 
self-diffusion of water, 297 
serum albumin, 298-99 
unfrozen water, 297 
hydrodynamic particle 
definition, 573 
ellipsoid of revolution, 574 
frictional coefficient, 574 
mass of, 573 
volume of, 573 
hydrodynamic radius 
definition, 574 
hydrogen 
scattering length, 583 
hydrogen atoms 
crystallography, 182 
hydrogen bonds, 204-22 
amide in water, 217 
angular dependence, 264-66 
apparent equilibrium constant in 
water, 219 
aromatic ring, 208 
association equilibrium constant, 
210 
bifurcated, 206 
bond angles, 206 
bond length, 205 
competitition of water, 220 
compressed, 214 
covalency, 215 
definition, 204 
deshielding, 209 


difference in pK,, 210 

distance between a donor and 
acceptor, 213 

effect of water, 216-20 

electrostatic attraction, 215 

enthalpy of formation, 210-11 

entropy of formation, 210 

fractionation factor, 209 

free energy of formation of low- 
barrier hydrogen bond, 215 

infrared spectrum, 209 

in interfaces, 478 

integral membrane-bound 
proteins, 782 

intramolecular, 207, 227 

low-barrier, 208-10 

lysozyme, 267 

membrane-spanning o helices, 775 

N-methylacetamide, 217 

nuclear magnetic resonance, 209 

potential energy of, 213 

proton exchange, 641 

secondary structure, 264 

solvation, 208 

strength, 210 

stretching frequency, 209 

strong, short, 214 

strongest possible, 212 

sulfur, 207 

symmetric, 212 

water, 190 

water in crystallographic molecular 
models, 295 

wells of potential energy, 208 

zero-point energy, 208 

hydrogen bonds in crystallographic 

molecular models, 306-14 

approximation, 311 

aromatic side chain, 306 

Bence-Jones protein, 308 

buried hydrogen bonds, 306 

chymotrypsin, 306, 308 

clusters of hydrogen bonds, 306 

contributions to folding, 680-81 

crystallin, 308 

deoxyribonuclease, 307-08 

dihydrolipoyllysine-residue acetyl 
transferase, 309 

donors and acceptors on the side 
chains, 306 

double-mutant cycle, 310 

entropy of approximation, 309 

fractionation factor, 312 

free energies of formation, 310 

frequency, 306 

histocompatibility antigen, 306 

hydrogen-bond balance, 307 


4-hydroxybenzoate 
3-monooxygenase, 309 
immune complex, 559-60 
length of the hydrogen bond, 312 
low-barrier hydrogen bonds, 312 
lysozyme, 311 
micrococcal nuclease, 312 
myoglobin, 306, 309, 312 
omit maps, 309 
penicillopepsin, 309 
phosphocarrier protein HPr, 312 
protein-tyrosine kinase, 312 
ribonuclease, 310, 313 
ribulose-bisphosphate 
carboxylase, 306-7, 306 
stability of a protein, 311 
stereochemistry, 306 
steric effects, 309 
streptococcal protein G, 312 
sulfate-binding protein, 309 
superoxide dismutase, 306 
thermolysin, 311 
transferrin, 312 
troponin C, 312 
trypsin, 309 
tryptophan, 308 
water, 308 
hydrogen bonds in DNA, 230 
hydrogen-bond balance 
hydrogen bonds in 
crystallographic molecular 
models, 307 
hydrogen-bonded nearest neighbors 
water, 193 
hydrogen-carbon bonds 
hydropathy side chains, 274 
hydrophobic effect, 234 
hydropathy, 241-46 
accessibilities to water, 244 
amide, 242 
arginine, 242 
asparagine, 276 
carboxylic acid, 242 
definition, 241 
evolution of proteins, 366 
glutamine, 276 
heterocyclic side chain, 242 
hydroxyl group, 241 
lysine, 242 
peptide bond, 242 
scales of, 244 
sulfur, 241 
transfer between water and the 
gas, 241 
hydropathy of side chains 
aromatic amino acids, 275 
cohesin domain, 275 
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free energies of transfer, 274 
hydrogen-carbon bonds, 274 
hydrophilic amino acids, 275 
lysozyme, 275 
polypeptide backbone, 276 
ribonuclease T}, 275 
hydrophilic amino acids 
hydropathy side chains, 275 
hydrophobic clusters 
aligning crystallographic molecular 
models, 369 
hydrophobic collapse 


kinetics of folding, 696 
hydrophobic effect, 230-41 
clathrates, 233 
compensatory thermodynamic 
changes, 233 
contributions to, 238 
definition, 231 
dielectric relaxation time, 233 
enthalpy change, 233 
entropy of transfer, 231 
free energies of solvation, 235 
free energy of transfer, 231-38 
heat capacity, 231 
hydrogen-carbon bonds, 234 
in interfaces, 479 
integral membrane-bound 
proteins, 783 
neutron scattering, 233 
partition coefficients between gas 
and water, 235-37 
size of the cavity, 234 
surface area, 238 
thermodynamic properties of the 
water, 231 
van der Waals forces, 235-37 
3-hydroxy fatty acids, 749 
N-[2-hydroxy-1,1-bis(hydroxymethyl) 
ethyl] glycine 
buffer, 68 
L-2-hydroxyisocaproate 
dehydrogenase 
molecular rotational axes of 
symmetry, 465 
molecular taxonomy, 396 
4-hydroxy-2-oxoglutarate aldolase 
aligning amino acid sequence, 
360 
2-hydroxy-6-ketonona-2,4-diene-1,9- 
dioic acid 5,6-hydrolase 
assay, 18 
(S)-2-hydroxy-acid oxidase 
space groups, 463 
3-hydroxyacyl-[acyl-carrier-protein] 
dehydratase 
domains, 381 
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3-hydroxyacyl-CoA dehydrogenase 
assay, 17 
4-hydroxybenzoate 
3-monooxygenase 
hydrogen bonds in 
crystallographic molecular 
models, 309 
3-hydroxybutyrate dehydrogenase 
anchored membrane-bound 
proteins, 764 
1-(2-hydroxyethyl)-4-(3-sulfoethyl) 
piperazine 
buffer, 68 
1-(2-hydroxyethyl) -4-(3-sulfopropyl) 
piperazine 
buffer, 68 
hydroxyl group 
hydropathy, 241 
hydroxylamine 
cleavage of polypeptide, 90 
hydroxylapatite, 8 
hydroxymethylglutaryl-CoA lyase 
assay, 17 
hydroxymethylglutaryl-CoA 
reductase 
anchored membrane-bound 
proteins, 764 
2-hydroxyphytanoyl-CoA lyase 
assay, 13 
4-hydroxyproline 
collagen, 504 
N-hydroxysuccinimide esters 
reagents for covalent modification, 
535 
hyperfine splitting 
electron paramagnetic resonance, 
647 


I 
ice Ih 
structure, 192 
water, 190 
IO 
reagent for covalent modification, 
538 
icosahedral point group 532, 488 
icosahedral symmetry 
protein coat of satellite tobacco 
necrosis virus, 494-95 
global rotational axis of symmetry, 
491 
hexagonal expansion, 497 
protein coat of a virus, 488-98 
protein coat of satellite panicum 
mosaic virus, 488-89 
protein coat of southern bean 
mosaic virus, 494 


protein coat of tomato bushy stunt 
virus, 494 
quasi-equivalence, 491-98 
ideal gas law 
osmotic pressure, 409 
image reconstruction 
acetylcholine receptor, 793 
actin, 503 
amorphous ice, 501, 790 
amplitude of electron diffraction, 
792 
amplitudes, 790 
aquaporin, 793 
Ca**-transporting ATPase, 793 
computational methods, 501 
computed Fourier transform, 501 
difference maps of scattering 
density, 502 
distribution of scattering density, 
501 
electron microscope, 501 
electron-scattering density, map 
of, 793 
flagellin, 502 
Fourier—Bessel transform, 502 
gap-junction channel, 793 
helical polymers, 501-3 
lattice lines, 791 
light-harvesting chlorophyll 
a/b-protein complex, 793 
membrane-bound proteins, 
790-93 
Na‘/K*-exchanging ATPase, 793 
negative staining, 501 
phase of Fourier transform, 792 
phases, 790 
refinement, 793 
sodium/proton antiporter NhaA, 
793 
tilt, 791 
tubulin, 502 
two-dimensional crystalline array, 
790 
imidazole 
electronic structure, 78 
imidazole glycerol phosphate 
synthase 
domains, 380, 383 
imidazoleglycerol-phosphate 
dehydratase 
assay, 17 
iminothiolane 
reagent for covalent modification, 
549 
immune complexes 
between immunoglobulin and 
antigen, 558 


complementarity-determining 
regions in, 559 
conformational changes, 561 
cytochrome c, 560 
glycosylation, 558 
hydrogen bonds, 559-60 
interfaces, 559 
lysozyme, 558-61 
micrococcal nuclease, 559-60 
viruses, 561 
water, 560 
immunity protein 
light scattering, 591 
immunity protein Im9 
nuclear magnetic resonance, 639 
immunization 
to elicit immunoglobulins, 555 
immunoadsorbent, 563 
definition, 566 
dystrophin, 566 
protein A, 566 
purifying a peptide, 563 
Shaker S4 K* channel, 566 
to follow covalent modification, 545 
voltage-gated chloride channel, 566 
immunoadsorption, 566 
membrane-bound proteins, 773 
immunoblotting, 565 
immunodiffusion, 47, 564 
immunoelectron microscopy, 567-68 
fibrinogen, 567 
o2-macroglobulin, 567 
multicatalytic endopeptidase, 567 
ribosome, 567-69 
immunoelectrophoresis, 47 
immunoglobulin A 
structure, 557 
immunoglobulin D 
oligosaccharides of glycoproteins, 
127 
immunoglobulin domain, 386 
immunoglobulin e receptor 
steric exclusion, 510 
immunoglobulin G 
bivalent fragment, 556 
camel, 559 
collisional quenching, 603 
domains, 376, 378, 382, 390, 477, 
555 
electrophoresis, 42 
Fab fragment, 556 
Fc fragment, 556 
free energy of folding, 675 
frictional coefficient, 588 
function, 555-57 
heterologous interfaces, 513 
hinges, 557 


impermeant reagents, 800 
infrared spectrum, 595 
mean molar mass of an amino 
acid, 418 
sieving, 424, 428 
structure, 555-57 
thermodynamics of folding, 682 
topography of membrane- 
spanning proteins, 800 
immunoglobulin G binding protein G 
kinetics of folding, 694 
nuclear magnetic resonance, 633 
immunoglobulin M 
structure, 557 
X-ray scattering, 584 
immunoglobulins 
antibodies, 555 
binding to antigens, 555 
complementarity-determining 
regions, 558 
function, 555 
in serum, 555 
molecular taxonomy, 397 
monoclonal, 558 
myeloma protein, 557 
packing of ß sheets, 285 
packing of side chains, 281 
polyclonal, 557 
proline isomerization, 701 
production by lymphocytes, 557 
proton exchange, 645 
purifying a peptide, 563 
screen libraries, 567 
immunoprecipitation, 564 
detection of heterologous 
associations, 515 
equivalence point, 564 
immunostaining 
alkaline phosphatase, 566 
amino terminus, 566 
carboxy terminus, 566 
cross-linking, 566 
dicarboxylate transporter, 566 
electrophoresis, 566 
end-labeling, 567 
NADH dehydrogenase 
(ubiquinone), 565-66 
peroxidase, 566 
IMP dehydrogenase 
point group, 469 
quaternary structure, 475 
impermeant reagents 
covalent modification, 798 
topography of membrane- 
spanning proteins, 798 
impermeant solute 
osmotic pressure, 408 


importin 
heterologous associations, 514 
included volume 
chromatography, 12 
inclusion bodies 
expressing DNA, 109 
incremental scattering 
light scattering, 415 
independently folding domain, 389 
independently shifting domains, 388 
index 
crystallography, 151-53 
individual domain 
molecular taxonomy, 393 
indole 
tryptophan, 76 
indole-3-glycerol-phosphate 
synthase 
domains, 377, 384 
induction 
acids and bases, 63 
infrared absorption 
water, 195 
infrared light 
absorption of, 594 
infrared spectroscopy 
selection rules, 595 
infrared spectrum 
amide, 595 
hemoglobin, 595 
hydrogen bond, 209 
immunoglobulin G, 595 
of a protein, 595 
ribonuclease, 595 
secondary structure, 596 
inhibitors of peptidases 
purification, 49 
initiation codon 
sequence of DNA, 106 
initiation factor IF3 
domains, 390 
inorganic diphosphatase 
assembly of oligomers, 710 
metalloproteins, 329, 332 
molecular rotational axes of 
symmetry, 465 
insect cells 
expression of DNA, 109 
insertion 
evolution of proteins, 350 
insulin receptor 
subunits, 436 
insulin-like growth factor 
cystines, formation of, 708 
intact cells 
topography of membrane- 
spanning proteins, 798 
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integral membrane-bound proteins, 
766 
B-adrenergic receptor, 816 
B barrel, 776 
boundary layer of phospholipid, 
784-86 
closed structures, 787 
crystallization, 775 
cyclic symmetry, 787 
diacylglycerol kinase, 766 
epidermal growth factor receptor, 
766 
growth hormone receptor, 816 
hydrogen bonds, 782 
hydrophobic effect, 783 
membrane-spanning o helices, 776 
NADH dehydrogenase 
(ubiquinone), 766 
oligomers, 786 
packing of the o helices, 781 
quantitative cross-linking, 786 
rotational axes of 
pseudosymmetry, 789 
rotational axes of symmetry, 787 
ryanodine receptor, 766 
spin-labeled phospholipids, 795 
water in the interior of, 781 
integrin 
heterologous associations, 516 
intein 
posttranslational modification, 115 
interdigitation of side chains 
packing of side chains, 278 
interface 
Cro protein, 478 
definition, 455 
evolution of, 455-56, 469 
4-a-glucanotransferase, 478 
phosphopyruvate hydratase, 478 
interfaces 
adenylosuccinate lyase, 480 
ADP-ribose diphosphatase, 481 
alcohol dehydrogenase, 480 
arginine in, 478 
carboxylesterase ESTA, 478-80 
catalase, 481 
chloramphenicol 
O-acetyltransferase, 480 
4-chlorobenzoyl-CoA 
dehalogenase, 481 
concanavalin A, 480 
dihydrolipoyllysine residue 
acetyltransferase, 479 
dihydroneopterin aldolase, 480 
6,7-dimethyl-8-ribityllumazine 
synthase, 488 
ferritin, 489-90 
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free energies of association, 471 
fumarate hydratase II, 480 
general control protein GCN4, 480 
glucose oxidase, 480 
glucose-6-phosphate isomerase, 481 
glucose-fructose oxidoreductase, 
480 
glutathione transferase, 479 
heme-binding protein 23, 480 
hemoglobin, 471-72 
heterologous associations, 513 
histidine ammonia-lyase, 480 
hydrogen bonds in, 478 
hydrophobic effect in, 479 
immune complex, 559 
interleukin-5, 481 
intimin receptor, 480 
isometric structures, 488 
K bungarotoxin, 480 
lac repressor, 480 
lectin, 480 
mannose-binding protein, 480 
methyl-accepting chemotaxis 
protein II, 480 
oligomeric integral membrane- 
bound proteins, 790 
oligomeric proteins, 478 
porin, 479 
protein coat of rhinovirus 14, 480 
protein coat of satellite panicum 
mosaic virus, 489 
quasi-equivalence, 492-94 
reversible dissociation, 471 
ribulose-phosphate 3-epimerase, 
480 
structural swapping, 480-81 
structure of, 478 
superoxide dismutase, 480 
urate oxidase, 480 
variant surface glycoprotein, 480 
interfacial denaturation 
chromatography, 3 
interleukin 
molecular taxonomy, 399 
interleukin 1 
heterologous interfaces, 513 
interleukin 18 
aligning crystallographic molecular 
models, 366 
nuclear magnetic resonance, 628 
water in crystallographic molecular 
models, 294 
interleukin 4 
nuclear magnetic resonance, 629, 
639 
interleukin 5 
interfaces, 481 


interleukin 6 
molten globule, 684 
interleukin 13 
nuclear magnetic resonance, 627 
interleukin-1 receptor 
heterologous interfaces, 513 
intermediate filament 
helical cable, 506 
intermediate filaments, 506-8 
coiled coils of o helices, 506 
desmin filaments, 506 
glial filaments, 506 
heptad repeat, 506 
keratin filaments, 506 
neurofilaments, 506 
tonofilaments, 506 
vimentin filaments, 506 
intermediate states 
folding, 668 
molten globule, 683 
intermolecular aggregation 
kinetics of folding, 705 
internal duplication 
domains, 383 
internal duplications 
molecular rotational axes of 
pseudosymmetry, 476 
internally repeating domain 
domains, 382 
interstitial molecules of water, 194 
interstrand hydrogen bond 
collagen, 504 
intersystem crossing, 542 
absorption of light, 595 
intestinal fatty acid binding protein 
kinetics of folding, 697 
intimin receptor 
interfaces, 480 
intracellular membranes, 743 
intramolecular association constant, 
222 
intramolecular hydrogen bonds 
a helices, 228 
B structure, 228 
p turn, 227 
dicarboxylic acids, 227 
intramolecular interference 
scattering of electromagnetic 
radiation, 579 
intramolecular processes, 222-30 
decrease in standard free energy of 
association, 226 
effective molarity, 224 
entropy of approximation, 224 
entropy of molecularity, 225 
entropy of rotational restraint, 
224-26 


intramolecular proton transfer, 70 
intrinsic viscosity 
collagen, 579 
definition, 578 
molten globule, 684 
random coil, 660 
introns 
evolution of proteins, 350 
structure of DNA, 98 
invariant position 
aligning amino acid sequences, 348 
inversion-specific glycoprotein 
electron microscopy, 586 
iodoacetamide, 546 
reagent for covalent modification, 
530 
specificity, 532 
iodoacetate, 550 
5-['*Tiodonaphthyl azide 
hydrophobic reagent for covalent 
modification, 797 
ion 
chelation, 203 
enthalpy of hydration, 200-201 
entropy of hydration, 202 
layer of hydration, 200 
self-charging energy, 200 
ion exchange 
media, 9 
ion exchange chromatography, 8 
purification, 23 
ion pairs 
carboxylate-ammonium, 202 
divalent metal ions, 203 
enthalpy of hydration, 201 
ionic interactions in 
crystallographic molecular 
models, 302 
standard enthalpy of formation, 200 
ionic bonds 
metal ion, 327 
ionic double layer 
chromatography, 8 
electrophoresis, 38 
ionic interactions, 199-204 
ionic interactions in crystallographic 
molecular models, 300-306 
acid dissociation constants, 300 
arc repressor, 304 
buried acid-bases, 301 
buried ion pair, 303 
chloramphenicol O-acetyl 
transferase, 302 
dihydrolipoyllysine-residue 
acetyltransferase, 300 
electrostatic work, 301 
frequency, 306 


ion pair, 302 
ionized hydrogen bond, 302 
a-lytic endopeptidase, 303 
neutron diffraction, 303 
pepsinogen, 303 
relative permittivity, 303 
site-directed mutation, 303 
subtilisin, 300 
tautomeric interactions, 300 
titration curve, 300 
xylose isomerase, 302 
ionic radius 
metal ion, 327 
ionic strength, 203 
definition, 39 
electrophoresis, 39 
ionized hydrogen bond 
ionic interactions in 
crystallographic molecular 
models, 302 
ions 
crystallography, 181 
electrostatic repulsion, 203 
ion-trap mass spectrometer 
mass spectrometry, 91 
iron 
metalloproteins, 330 
iron-sulfur cluster 
metalloproteins, 330 
irreversible adsorption 
chromatography, 3 
isethionyl ['C]acetimidate 
impermeant reagent for covalent 
modification, 799 
isoaspartyl peptide bond, 115 
isocitrate dehydrogenase (NAD*) 
electrophoresis, 46 
purification, 26, 29 
isocitrate lyase 
covalent modification, 544 
isocratic zonal chromatography, 7 
isocyanates 
reagents for covalent modification, 
535 
isoelectric focusing, 47 
isoelectric point 
definition, 33 
isoelectric precipitation 
purification, 23 
isoforms 
definition, 358 
evolution of proteins, 358 
malate dehydrogenase, 359 
isogloboside 
glycosphingolipid, 748 
isoionic point 
definition, 32 


isoleucine 
electronic structure, 76 


stereochemistry of side chains, 268 
isometric oligomeric proteins, 485-99 


isometric point groups, 486 
isometric structures 
interfaces, 488 
isoprenylation 
anchored membrane-bound 
proteins, 765 


posttranslational modification, 117 


isopyncnic centrifugation 
cell fractionation, 744 
isothiocyanates 


reagents for covalent modification, 


534 


jumbled amino acid sequences 


aligning amino acid sequences, 353 


K 
K’-transporting ATPase 
topography of membrane- 
spanning proteins, 802-3 
karyopherin 82 
heterologous interfaces, 513 
KcsA potassium channel 
metalloproteins, 328 
keratin 
coiled coil of a helices, 282 
helical cable, 508 
keratin filaments 
distribution in cell, 507 
intermediate filaments, 506 
3-keto fatty acids, 749 
keto-enol tautomers, 70 
a-ketoisocaproate oxygenase 
purification, 25 
kinesin 
microtubules, 730 
molecular taxonomy, 393 
kinetic burst 
kinetics of folding, 689 
kinetic dead-ends 
kinetics of folding, 705 
kinetic mechanism 
proton exchange, 643 
kinetics 


assembly of microtubules, 724-26 


covalent modification, 531 

of assembly of oligomers, 711 
kinetics of folding, 688-710 

activation volume, 703 

acyl-CoA-binding protein, 697 

apomyoglobin, 689, 693, 695, 697 

approach to equilibrium, 666 
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arc repressor, 703 

aspartate kinase-homoserine 
dehydrogenase, 709 

barstar, 690, 695 

cell surface receptor CD2, 690 

chymotrypsin inhibitor 2A, 702 

circular dichroism, 689 

cold shock-like protein, 667, 702 

colicin E7 immunity protein, 694 

continuous flow, 694 

continuum of intermediate states, 
704 

cystines, 708-9 

cytochrome c, 689, 692, 695, 697, 
698-99 

cytochrome c’, 704 

cytochrome c;, 691 

dead time, 688 

dihydrofolate reductase, 689, 
691-92, 694, 698, 704 

dihydrolipoyllysine-residue 
acetyltransferase, 702 

dilution, 688 

domains, 709 

engrailed homeodomain, 702 

fatty acid-binding protein, 694 

fluorescence, 689 

heat capacity, 703 

human acylphosphatase, 702 

hydrophobic collapse, 696 

immunoglobulin G binding 
protein G, 694 

intermolecular aggregation, 705 

intestinal fatty acid binding 
protein, 697 

kinetic burst, 689 

kinetic dead-ends, 705 

B-lactoglobulin, 689, 691, 693-94, 
698 

lysozyme, 689-91, 694, 698, 704, 
710 

micrococcal nuclease, 689, 698 

molten globule, 703 

molten globules, 692-94 

multiple steps, 698 

myoglobin, 667 

d-octopine dehydrogenase, 709 

parallel pathways, 704 

pH effect on, 703 

phosphoglycerate kinase, 690 

proline isomerization, 698-702 

protein A, 702 

protein G, 697 

protein L, 696 

protein L9, 703 

protein S6, 702 

proton exchange, 691-92, 698 
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rapid mixing chamber, 688 
rate constant for folding, 667 
A repressor, 702-03 
ribonuclease, 695 
ribonuclease H, 688-90, 693-94, 697 
scattering of X-radiation, 691 
site-directed mutation, 703 
stopped-flow apparatus, 688 
temperature jump, 695 
ubiquitin, 702 
viscosity effect on, 697, 703 
WW domain, 703 

kringle 
domains, 386, 390 


L 
a-lactalbumin 
aligning amino acid sequences, 360 
covalent modification, 545 
folding, 669, 685 
hydration of a protein, 298 
metalloproteins, 329 
molten globule, 683-84 
thermodynamics of folding, 682 
B-lactamase 
topography of membrane- 
spanning proteins, 800 
water in crystallographic molecular 
models, 293 
D-lactate dehydrogenase 
molecular taxonomy, 396 
L-lactate dehydrogenase 
convergent evolution, 373 
cross-linking, 443-45 
dodecyl sulfate gel electrophoresis, 
422 
domains, 382-83 
evolution of proteins, 359 
frictional ratio, 426 
isoforms of, 359 
mean molar mass of an amino 
acid, 418 
molar mass, 418 
molecular taxonomy, 393-95 
osmotic pressure, 419 
packing of side chains, 288 
purification, 29 
quaternary structure, 407 
recurring structure, 373-74 
sieving, 424, 427 
space groups, 463 
L-lactate dehydrogenase 
(cytochrome) 
convergent evolution, 373 
point group, 469-70 
B-lactoglobulin 
electrophoresis, 42 


hydration of a protein, 298 
kinetics of folding, 689, 691, 
693-94, 698 
molar mass, 418 
molecular charge, 33 
osmotic pressure, 419 
sieving, 428 
thermodynamics of folding, 671 
lactoperoxidase 
reagent for covalent modification, 
538 
sieving, 424 
topography of membrane- 
spanning proteins, 799 
lactose 
preferential solvation, 31 
lactose permease 
membrane-spanning o helices, 795 
topography of membrane- 
spanning proteins, 800 
lactoside 
glycosphingolipid, 748 
lacZ promoter 
expressing DNA, 108 
ladder 
sequencing of DNA, 102 
lamin A 
aligning amino acid sequence, 360 
laminar flow 
viscosity, 578 
laminin 
heterologous associations, 516 
heterooligomers, 513 
laminin yl 
domains, 390 
large tumor antigen, 562 
large-conductance mechanosensitive 
channel 
crystallization, 772 
membrane-spanning helices, 772 
overexpression, 776 
Larmor frequency 
nuclear magnetic resonance, 613 
lattice 
crystallography, 151 
lattice lines 
image reconstruction, 791 
layer line 
crystallography, 150 
layer of hydration 
ion, 200 
layers of hydration 
repulsion, 202 
lectin 
interfaces, 480 
molecular rotational axes of 
symmetry, 465 


space groups, 460 
lectin IV 
cis peptide bond, 252 
metalloproteins, 330 
left-handed twist 
B structure, 260 
leghemoglobin 
aligning amino acid sequence, 360 
aligning crystallographic molecular 
models, 369 
length of a hydrogen bond 
hydrogen bonds in 
crystallographic molecular 
models, 312 
length of a polypeptide 
definition, 407 
leucine 
electronic structure, 76 
leucine-rich repeat, 386 
leucyl aminopeptidase 
sequencing of polypeptides, 91 
Lewis acids, 326 
Lewis bases, 326 
Lewis structure, 56 
library 
cloning of DNA, 99 
licheninase 
molecular taxonomy, 398 
lifetime 
of fluorescence, 602 
lifetime of the excited state 
fluorescence resonance energy 
transfer, 605 
ligands 
metalloproteins, 327 
light scattering 
incremental scattering, 415 
molar mass, 414-16 
ovalbumin, 420 
polarized light, 415 
Rayleigh’s ratio, 416 
refractive index, 415 
serum albumin, 416 
virial coefficient, 416 
Zimm plot, 416 
light-harvesting chlorophyll 
a/b-protein complex 
image reconstruction, 793 
limiting viscosity number, 579 
linoleic acid, 747 
a-linolenic acid, 747 
lipid A export ATP-binding protein 
crystallization, 772 
membrane-spanning helices, 772 
lipopolysaccharide 
glycolipid, 749 
lipoproteins, 804-5 


apolipoprotein B100, 805 
belt around the waist, 805 
chylomicrons, 804 
high density lipoprotein, 805 
lipovitellin, 804 
low density lipoprotein, 804 
phosvitin, 804 
very low density lipoprotein, 804 
vitellogenin, 804 
lipovitellin 
lipoproteins, 804 
liquid water 
water, 190 
local 3-fold rotational axis of 
pseudosymmetry 
quasi-equivalence, 491 
local conformational changes 
ribonuclease H, 678 
local minimum 
refinement, 176 
local rotational axis 
double-helical DNA, 467 
local rotational axis of symmetry 
definition, 491 
palindromic sequence, 467 
lone pair of electrons 
electronic structure, 56, 59 
low density lipoprotein 
lipoproteins, 804 
low-affinity immunoglobulin yFc 
region receptor 
nonstoichiometric ratio of 
subunits, 512 
low-barrier hydrogen bond, 208-10 
free energy of formation, 215 
hydrogen bonds in 
crystallographic molecular 
models, 312 
lowest unoccupied molecular orbital, 
58 
lymphocytes 
production of immunoglobulins, 
557 
lysine 
acid dissociation constant, 75 
covalent modification, 530-36 
cross-linking, 440 
electronic structure, 80 
hydropathy, 242 
nucleic acid, association of 
proteins with, 315 
water in crystallographic molecular 
models, 296 
lysosomes, 743 
cell fractionation, 744 
lysozyme 
aligning amino acid sequences, 360 


aligning crystallographic molecular 
models, 363 

crystallographic molecular model, 
170 

crystallography, 155 

diffusion coefficient, 578 

domains, 388 

epitopes, 561 

fluorescence, 601, 603 

fluorescence resonance energy 
transfer, 609 

folding, 678 

free energy of folding, 677, 687 

frictional coefficient, 578 

frictional ratio, 578 

hydration of a protein, 298-99 

hydrogen bonds, 267 

hydrogen bonds in crystallographic 
molecular models, 311 

hydropathy side chains, 275 

immune complex, 558-61 

ionic interactions, 300 

kinetics of folding, 689-91, 694, 
698, 704, 710 

mean molar mass of an amino 
acid, 418 

molar mass, 418 

molecular taxonomy, 393 

packing of side chains, 289 

preferential solvation, 661 

proton exchange, 645 

sedimentation coefficient, 578 

sieving, 424 

thermodynamics of folding, 682 

unfolding, 661 

virial coefficients, 419 

water in crystallographic molecular 
model, 299 

water in crystallographic molecular 
models, 296 

lysyl endopeptidase 
cleavage of polypeptide, 88 
a-lytic endopeptidase 

ionic interactions in 
crystallographic molecular 
models, 303 

stereochemistry of side chains, 268 

water in crystallographic molecular 
models, 294 


M 
&-macroglobulin 
electron microscopy, 588 
immunoelectron microscopy, 567 
macroscopic acid dissociation 
constant 
definition, 71 
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MADS-box protein MCMI 
association of proteins with 
nucleic acid, 320 
magnesium 
metalloproteins, 329, 332 
magnetic flux density 
nuclear magnetic resonance, 613 
magnetogyric ratio 
nuclear magnetic resonance, 613-14 
major cold-shock protein 
nuclear magnetic resonance, 630 
major groove 
nucleic acid structure, 315 
malate dehydrogenase 
assembly of oligomers, 712 
axes of symmetry, 452-53 
evolution of proteins, 359 
isoforms, 359 
molecular rotational axes of 
symmetry, 465 
sieving, 424, 428 
space groups, 461 
malate dehydrogenase (oxaloacetate- 
decarboxylating) (NADP*) 
purification, 29 
malate synthase 
purification, 30 
maltodextrin binding protein 
aligning crystallographic molecular 
models, 363 
maltoporin 
crystallization, 772 
maltose-binding protein 
molecular taxonomy, 396 
multiple isomorphous 
replacement, 160 
mammalian cells 
expression of DNA, 109 
(S)-mandelate dehydrogenase 
anchored membrane-bound 
proteins, 765 
mandelate racemase 
aligning amino acid sequences, 
361 
aligning crystallographic molecular 
models, 366 
manganese 
metalloproteins, 330 
manganese-stabilizing protein 
diffusion coefficient, 578 
frictional coefficient, 578 
frictional ratio, 578 
sedimentation coefficient, 578 
mannose-6-phosphate receptor 
domains, 385 
mannose-binding protein 
interfaces, 480 
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map of electron density 
calculation of, 157 
galactose oxidase, 165 
MAP protein kinase ERK2 
molecular taxonomy, 396 
maps of electron density, 149-62 
marker enzymes 
cell fractionation, 744 
mass 
of hydrodynamic particle, 573 
mass spectrometer, 550 
mass spectrometry 
a-amylase, 93 
electrospray mass spectrometer, 91 
fast-atom bombardment, 92 
fragment ions, 93 
ion-trap mass spectrometer, 91 
matrix-assisted-laser-desorption 
ionization, 92 
peptide map, 433 
posttranslational modification, 
119 
proton exchange, 642 
quadrupole mass spectrometer, 91 
sequencing of polypeptides, 91-93 
sequencing oligosaccharides, 136 
tandem mass spectrometer, 93 
time-of-flight mass spectrometer, 91 
thioredoxin, 92 
matrix-assisted-laser-desorption 
ionization 
mass spectrometry, 92 
maturation-promoting factor 
assay, 19 
maturity-onset diabetes, 508 
mean molar mass of an amino acid 
anion exchanger, 418 
bacteriorhodopsin, 418 
chymotrypsinogen, 418 
fibrinogen, 418 
hemoglobin, 418 
immunoglobulin G, 418 
L-lactate dehydrogenase, 418 
lysozyme, 418 
myosin, 418 
Na IK -echanging ATPase, 418 
parvalbumin, 418 
phosphorylase, 418 
protein coat from R17 virus, 418 
serum albumin, 418 
mean net molecular charge number 
definition, 32 
mean net proton charge number 
definition, 32 
medium-chain acyl-CoA 
dehydrogenase 
assay, 16 


melting point 
water, 190 
membrane alanyl aminopeptidase 
purification, 773 
membrane in MalG protein 
topography of membrane- 
spanning proteins, 800 
membrane-bound proteins, 763-824 
anchored, 764-65 
assay, 771 
band 3 anion transport protein, 767 
chromatography, 771 
enzymes, 766 
glycophorin, 767 
identification of genes for, 767 
image reconstruction, 790-93 
immunoadsorption, 773 
integral, 766 
myristoylation, 765 
overexpression, 776 
peripheral, 763-64 
photosynthetic reaction center, 765 
prostaglandin-endoperoxide 
synthase, 766 
proteolipid protein, 765 
purification, 768-73 
reconstitution, 771 
site-directed mutation, 773 
snorkeling, 780 
sterol carrier protein, 766 
vectorial insertion, 767 
membranes 
cytoplasmic surface, 743 
diffusion in, 812 
extracytoplasmic surface, 743 
frictional coefficient in, 812 
microsomes, 743 
microviscosity of, 814 
punching holes in, 803-4 
rotational diffusion coefficient in, 
812 
solvent is bilayer of lipids, 807 
translational diffusion coefficient 
in, 812 
viscosity of, 813 
membrane-spanning o helices 
acetylcholine receptor, 772, 794 
anchored membrane-bound 
proteins, 774 
aquaporin, 772 
bacteriorhodopsin, 772 
computational assignment, 795-96 
covalent modification, 796 
cross-linking, 803 
cytochrome-c oxidase, 772, 784, 796 
cytochrome o ubiquinol oxidase, 
772 


cystines, 783 
electron spin resonance, 793-95 
halorhodopsin, 772 
hydrogen bonds, 775 
integral membrane-bound 
proteins, 776 
lactose permease, 795 
large-conductance mechano- 
sensitive channel, 772 
lipid A export ATP-binding protein, 
772 
nuclear magnetic resonance, 793 
periodicities of exposure, 794 
photosynthetic reaction center, 
772, 784, 795 
potassium channel DcsA, 772 
prolines, 783 
protein MsbA, 772 
rhodopsin, 772 
succinate dehydrogenase, 772 
synthetic hydrophobic peptide, 774 
tryptophan, 774 
tyrosine, 774 
unspecific monooxygenase, 796 
membrane-spanning segment, 766 
anchored membrane-bound 
protein, 774 
messenger RNA, 98 
metal ion 
covalent bonds, 327 
hardness, 327 
ionic bonds, 327 
ionic radius, 327 
softness, 327 
metallic cations 
metalloproteins, 326 
L1 metallo-ß lactamase 
metallopeptidases, 49 
metalloproteins, 326-32 
alanine-tRNA ligase, 331 
aldehyde:ferredoxin 
oxidoreductase, 330 
arginase, 326 
aspartate carbamoyltransferase, 
326, 332 
a-thrombin, 328 
azurin, 331 
calcium, 328-29 
cobalt, 330 
copper, 331 
diphtheria toxin repressor, 332-33 
DNA-(apurinic or apyrimidinic 
site) lyase, 330 
endopeptidase K, 328-29 
glycosylase MutY, 330 
heme, 330 
inorganic diphosphatase, 329, 332 


iron, 330 
iron-sulfur cluster, 330 
KcsA potassium channel, 328 
a-lactalbumin, 329 
lectin IV, 330 
ligands, 327 
magnesium, 329, 332 
manganese, 330 
metallic cations, 326 
molybdenum, 330 
myoglobin, 326 
nickel, 330 
nitrate reductase, 330 
nitrile hydratase, 330 
pentacoordinate zinc, 331 
phosphoribosylaminoimidazole- 
carboxamide formyltrans- 
ferase, 328 
plastocyanin, 331 
potassium, 328 
sodium, 328 
sulfenic acid, 330 
sulfinic acid, 330 
thermitase, 329 
tungsten, 330 
urease, 330 
UTP-hexose-1-phosphate 
uridylyltransferase, 330-31 
vanadium, 330 
zinc, 331-32 
zinc finger, 326, 331 
zinc-binding protein TroA, 331-32 
methanol dehydrogenase 
B propeller, 264 
methionine 
covalent modification, 530, 532, 536 
electronic structure, 80 
stereochemistry of side chains, 270 
sulfone, 81-82 
sulfoxide, 81-82 
methionine adenosyltransferase 
rotational axis of symmetry, 480 
methionyl aminopeptidase 
aligning crystallographic molecular 
models, 362-63 
domains, 383 
molecular rotational axes of 
pseudosymmetry, 476 
methyl acetimidate, 547 
reagent for covalent modification, 
532-34 
methylamine-glutamate 
N-methyltransferase 
assay, 14 
methyl-accepting chemotaxis protein 
coiled coil of œ helices, 284 
interfaces, 480 


methylcrotonyl-CoA carboxylase 
purification, 25 
2-methyleneglutarate mutase 
assay, 16 
methyl group 
stereochemistry of side chains, 270 
methyl group of thymine 
nucleic acid, association of 
proteins with, 318 
methylation 
posttranslational modification, 115 
sequencing oligosaccharides, 135 
methylmalonyl-CoA 
carboxytransferase 
peptide map, 435 
subunits, 435-36 
methylmalonyl-CoA decarboxylase 
aligning amino acid sequences, 360 
methylmalonyl-CoA mutase 
aligning amino acid sequence, 360 
binding of ligands, 47 
micelle of dodecyl sulfate 
dodecyl sulfate gel electrophoresis, 
421 
micrococcal nuclease 
B structure, 261 
epitopes, 562 
folding, 678-79 
hydrogen bonds in crystallographic 
molecular models, 312 
immune complex, 559-60 
kinetics of folding, 689, 698 
proton exchange, 645 
purification, 26 
B,-microglobulin 
aligning amino acid sequences, 
353, 360 
microheterogeneity 
oligosaccharides of glycoproteins, 
129 
microscopic acid dissociation 
definition, 62 
microscopic acid dissociation 
constants, 62 
histidine, 79 
microsin 
posttranslational modification, 114 
microsomes 
membranes, 743 
microtubules, 721-29 
dynein, 730 
helical polymer, 506 
helical surface lattice, 722 
kinesin, 730 
polarity, 723 
seam, 723 
structure, 722 
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microtubule-associated proteins 
assembly of microtubules, 730 
microviscosity 
cholesterol effect on, 814 
ofmembranes, 814 
mini thick filaments of myosin, 732 
minimal mutational distance 
aligning amino acid sequences, 356 
minor groove 
nucleic acid structure, 315 
nucleic acid, association of 
proteins with, 318 
mismatch repair protein MutS 
nucleic acid, association of 
proteins with, 321 
mismatched symmetry 
dihydrolipoyl dehydrogenase, 511 
dihydrolipoyllysine-residue 
succinyltransferase, 511 
heterooligomers, 511 
oxoglutarate dehydrogenase 
(succinyl-transferring), 511 
2-oxoglutarate dehydrogenase 
complex, 511 
mitochondria, 743 
cell fractionation, 744 
topography of membrane- 
spanning proteins, 798 
mitochondrial H*-transporting two- 
sector ATPase 
space groups, 461 
mixed disulfide 
cystines, formation of, 708 
mobile phase 
definition, 2 
model compound for an amino acid, 
74 
modular domains 
domains, 385 
heterologous associations, 513 
in heterooligomers, 513 
molar ellipticity 
circular dichroism, 598 
molar mass, 408-21 
apoferritin, 418 
aspartate carbamoyltransferase, 
418-419 
aspartate kinase I-homoserine 
dehydrogenase I, 418, 420 
catalase, 418 
chymotrysinogen, 418 
definition, 408 
2-dehydro-3-deoxyphospho- 
gluconate aldolase, 418 
dry weight, measurement of, 419 
electrospray mass spectrometry, 
416-17 
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fructose-bisphosphate aldolase, 
418-19 
glutamate dehydrogenase, 418 
glutamate-ammonia ligase, 420 
glyceraldehyde-3-phosphate 
dehydrogenase, 418 
B-lactoglobulin, 418 
L-lactate dehydrogenase, 418 
light scattering, 414-16 
lysozyme, 418 
osmotic pressure, 408-11 
pepsin, 418 
phosphorylase, 418 
ribonuclease, 418-19 
sedimentation equilibrium, 411-14 
serum albumin, 412, 418 
molar volume, 197 
water, 192 
molecular asymmetric unit 
definition, 472 
molecular rotational axes of 
symmetry, 472 
molecular axes of symmetry 
definition, 461 
erythrocruorin, 481 
space groups, 461 
molecular charge, 32-36 
B-lactoglobulin, 33 
bound ions, 33 
deoxyhemoglobin, 34 
exotoxin A, 34 
fructose-bisphosphate aldolase, 
36 
loosely bound ions, 34 
plasminogen activator inhibitor 1, 
34 
ribonuclease, 35 
tryptophanase, 34 
molecular dynamics 
refinement, 177 
molecular exclusion chromatography 
chromatography, 11 
distribution coefficient, 12 
hydrophilic media, 11 
included volume, 12 
purification, 24 
sieving, 423 
molecular mass 
definition, 408 
molecular model, 162-72 
crystallography, 163 
nuclear magnetic resonance, 631 
molecular orbital 
antibonding, 57 
bonding, 57 
energy level, 56 
node, 56 


nonbonding, 57 
phase, 56 
molecular replacement 
crystallography, 182 
molecular rotational axes of 
pseudosymmetry 
arabinose binding protein, 476 
5-carboxymethyl-2-hydroxy- 
muconate A-isomerase, 477 
chymotrypsinogen, 476 
internal duplications, 476 
methionyl aminopeptidase, 476 
4-oxalocrotonate tautomerase, 477 
phaseolin, 477 
pyruvate oxidase, 477 
sulfite reductase, 476 
thiosulfate sulfurtransferase, 476-77 
molecular rotational axes of 
symmetry 
crystal packing, 465 
formate dehydrogenase, 465 
glyceraldehyde-3-phosphate 
dehydrogenase 
(phosphorylating), 464 
heat-labile enterotoxin, 465 
L-2-hydroxyisocaproate 
dehydrogenase, 465 
inorganic diphosphatase, 465 
lectin, 465 
malate dehydrogenase, 465 
molecular asymmetric unit, 472 
ribulose- bisphosphate 
carboxylase, 465 
self-rotation function, 465 
superposed o carbons, 465 
triose-phosphate isomerase, 465 
molecular surface 
definition, 277 
molecular taxonomy, 392-99 
adenosine kinase, 396 
adenosylmethionine-8-amino-7- 
oxononanate transaminase, 
396 
adenylate kinase, 395 
d-alanine—d-alanine ligase, 397 
alcohol dehydrogenase, 395 
alkylhalidase, 396 
arabinose-binding protein, 395 
architecture, 396 
asparagine synthase, 393 
aspartate transaminase, 396 
aspartate-semialdehyde 
dehydrogenase, 396 
aspartate-tRNA ligase, 393 
benzoylformate decarboxylase, 396 
biotin carboxylase, 397 
carbonate dehydratase, 393, 395 


carboxymethylenebutenolidase, 396 

carboxypeptidase, 395 

carboxypeptidase D, 396 

cathepsin K, 393 

cohesin domain, 397 

coincident structure, 396 

common fold, 393 

cyclic-AMP dependent protein 
kinase, 396 

cyclin-dependent protein kinase 2, 
396 

cystathionine f-lyase, 396 

dihdrofolate reductase, 395 

erythronate-4-phosphate 
dehydrogenase, 396 

Factor D, 396 

family of domains, 396 

ferredoxin-NADP* reductase, 393 

flavodoxin, 395 

glucan 1,4-a-glucosidase, 396 

glutathione synthase, 397 

glutathione-disulfide reductase, 
394-95 

glyceraldehyde 3-phosphate 
dehydrogenase, 395-96 

glycerate dehydrogenase, 396 

granulocyte-colony-stimulating 
factor, 399 

growth hormone, 399 

hemoglobin, 394, 398 

hexokinase, 395 

hierarchical classification, 393 

L-2-hydroxyisocaproate 
dehydrogenase, 396 

immunoglobulin, 397 

interleukin, 399 

kinesin, 393 

d-lactate dehydrogenase, 396 

L-lactate dehydrogenase, 393-95 

licheninase, 398 

lysozyme, 393 

maltose-binding protein, 396 

MAP protein kinase ERK2, 396 

myohemerythrin, 394, 398 

myosin, 393 

newer proteins, 392 

ornithine decarboxylase, 396 

papain, 393-94, 398 

phosphofructokinase, 395 

phosphoglycerate dehydrogenase, 
396 

phosphoglycerate kinase, 395 

phosphoglycerate mutase, 395 

phosphopyruvate hydratase, 397 

phosphoribosylamine-glycine 
ligase, 397 

phosphorylase, 395 


phosphoserine transaminase, 396 
primordial proteins, 392 
protein coat of satellite tobacco 
necrosis virus, 393 
protein coat of tomato bushy stunt 
virus, 397 
pyruvate decarboxylase, 396 
pyruvate kinase, 394-95, 397 
ribokinase, 396 
ribonucleoside-diphosphate 
reductase, 397 
speciation of proteins, 393 
species of domains, 393 
subtilisin, 395 
succinate-CoA ligase, 395 
superfamily, 396 
synapsin Ia, 397 
thiamine pyridinylase, 396 
thioredoxin, 395 
thioredoxin-disulfide reductase, 393 
thiosulfate sulfurtransferase, 395 
triosephosphate isomerase, 394 
trypsin, 396 
tumor necrosis factor, 393 
tyrosine phenol-lyase, 396 
molecularity, 222-30 
entropy of, 225 
molluside 
glycosphingolipid, 748 
molten globules, 683-84 
a-lactalbumin, 683-84 
apomyoglobin, 684 
assembly of oligomers, 712 
carbonate dehydratase, 683 
circular dichroism, 684 
configurational entropy, 683 
cytochrome c, 683-84 
definition, 683 
diffusion coefficient, 684 
fluorescence, 684 
interleukin 6, 684 
intermediate states, 683 
intrinsic viscosity, 684 
kinetics of folding, 692-94, 703 
neutron scattering, 684 
nuclear magnetic resonance 
spectrum, 684 
proton exchange, 684 
rotational relaxation time, 684 
stable intermediates, 683 
ultrasound, 684 
molybdenum 
metalloproteins, 330 
monoclinic lattice 
crystallography, 151 
monoclonal immunoglobulin, 558 
monodisperse solution 


definition, 408 
monolayer of lipids 
air and water interface, 761 
surface area at zero pressure, 761 
surface pressure, 761 
monophenol monooxygenase 
assay, 18 
posttranslational modification, 122 
mosaic eukaryotic protein 
domains, 385 
MotA protein 
topography of membrane- 
spanning proteins, 799 
moving boundary electrophoresis, 41 
MQF 
nuclear magnetic resonance, 621 
mucin MUC2 
length, 85 
mucins, 132 
mucoside 
glycosphingolipid, 748 
multicatalytic endopeptidase 
complex 
heterooligomers, 508 
immunoelectron microscopy, 567 
multienzyme complex 
domains, 379-82 
multifunctional endopeptidase 
quaternary structure, 475 
multimeric protein 
definition, 407 
multiple isomorphous replacement, 
155 
alcohol dehydrogenase, 161 
anomalous dispersion, 161 
apoferritin, 161 
chloramphenicol O-acetyltrans- 
ferase, 160 
crystallography, 158-61 
deoxyribonuclease, 161 
heavy atom, 158 
maltose binding protein, 160 
trimethylamine-N-oxide reductase 
(cytochrome c), 160 
vector equations, 159 
murine L cells 
expression of DNA, 110 
mutagenic oligonucleotide 
site-directed mutation, 110 
mutation probability 
definition, 349 
evolution of proteins, 349-50 
myeloma protein 
immunoglobulins, 557 
myoglobin 
aligning amino acid sequences, 
345, 360 
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aligning crystallographic molecular 
models, 369-72 
crystallographic molecular model, 
170 
dodecyl sulfate gel electrophoresis, 
422 
electron paramagnetic resonance, 
648-49 
electrophoresis, 42 
free energy of folding, 677 
frictional ratio, 426 
hydration of a protein, 298 
hydrogen bonds in 
crystallographic molecular 
models, 306, 309, 312 
kinetics of folding, 667 
metalloproteins, 326 
nuclear magnetic resonance, 635 
proton exchange, 642 
radius of gyration, 581 
sieving, 424 
thermodynamics of folding, 673 
X-ray scattering, 583 
myohemerythrin 
molecular taxonomy, 394, 398 
myo-inositol, 747 
myosin, 730-32 
coiled coil of o helices, 282, 730 
covalent modification, 544 
cross-linking, 549 
electron microscopy, 586 
fluorescence resonance energy 
transfer, 609 
frictional coefficient, 589 
globular heads, 730 
light scattering, 590 
mean molar mass of an amino 
acid, 418 
molecular taxonomy, 393 
structure, 730 
[myosin-light-chain] kinase 
heterologous associations, 517 
proton exchange, 645 
myosin subfragment 1 
assay, 17 
myristoylation 
membrane-bound proteins, 765 


N 

Na‘/K*-exchanging ATPase 
aligning amino acid sequences, 364 
circular dichroism, 600 
covalent modification, 546 
covalent modification from within 

the bilayer, 797 

cross-linking, 444-45 
expression, 776 
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fluorescence resonance energy 
transfer, 609 

image reconstruction, 793 

immunochemistry, 568 

mean molar mass of an amino 
acid, 418 

peptide map, 433 

purification, 768 

topography of membrane- 
spanning proteins, 799, 802 

NADH dehydrogenase (ubiquinone) 
immunostaining, 565-66 


integral membrane-bound protein, 


766 
NADH peroxidase 
domains, 388 
N—O acyl migration 


posttranslational modification, 114 


native state 
definition, 659 
native structure 
crystallography, 171 
natural selection 
quaternary structure, 455 
nebulin 
assembly of actin, 730 
domains, 384 
negative staining 
electron microscopy, 585 
image reconstruction, 501 
neolactoside 
glycosphingolipid, 748 
net magnetization 
nuclear magnetic resonance, 613 
D-neuraminic acid 
oligosaccharides of glycoproteins, 
128 
neuraminidase 
epitope, 562 
neurofilaments 
intermediate filaments, 506 
neutral replacement 
evolution of proteins, 348 
neutron diffraction 
bilayer of phospholipid, 750, 754 
ionic interactions in 
crystallographic molecular 
models, 303 
proton exchange, 645 
neutron scattering, 583-84 
hydrophobic effect, 233 
molten globule, 684 
ribosome, 583-84 
neutron scattering density 
bilayer of phospholipid, 755 
newer proteins 
molecular taxonomy, 392 


nickel 
metalloproteins, 330 
nicotinate-nucleotide 
diphosphorylase 
crystallization, 49 
nidogen 
electron microscopy, 586 
heterooligomers, 513 
nitrate reductase 
metalloproteins, 330 


water in crystallographic molecular 


models, 293 
nitrene 
reagent for covalent modification, 
542 
singlet, 542 
triplet, 542 
nitric-oxide synthase 
convergent evolution, 373 


electron nuclear double resonance, 


649-50 
electron paramagnetic resonance, 
649-50 
rotational axis of symmetry, 480 
nitrile hydratase 
metalloproteins, 330 
nitrite reductase 
crystallography, 181 
X-ray scattering, 583 
2-nitro-5-thiocyanatobenzoate 
cleavage of polypeptides, 87, 89 
nitrogenase 
electron paramagnetic resonance, 
646 
peptide map, 435 


2-[(2-nitrophenyl) sulfenyl]-3-methyl- 


3’-bromoindolenine 
reagent for covalent modification, 
539 
2-(p-nitrophenyl)-3-(3-carboxy- 
4-nitrophenyDthio-1-propene 
cross-linking reagent, 548 
p-nitrophenylethanedione 
reagent for covalent modification, 
539 
nitrotyrosine 
ultraviolet absorption spectra, 
601 
nitroxyl fatty acid 
bilayer of phospholipid, 755-56 
nitroxylphosphatidylcholine 
boundary layer of phospholipid, 
785 
N-methylacetamide 
hydrogen bond, 217 
node 
molecular orbital, 56 


nonbonding 
molecular orbital, 57 
nonionic detergents 
for purification of membrane- 
bound proteins, 768-71 
nonstoichiometric ratio of subunits 
low-affinity immunoglobulin yFc 
region receptor, 512 
pyruvate dehydrogenase complex, 
512 
nonstoichiometric ratios of subunits 
heterooligomers, 511 
n-tetradecanoyl amide 
posttranslational modification, 117 
nuclear import factor karyopherin o 
heterologous associations, 514 
nuclear localization signals 
heterologous associations, 514 
nuclear magnetic resonance, 613-40 
æ helix, 631 
a-amylase inhibitor HOE-467A, 631 
acid dissociation constants, 635-38 
acid-base titration curve, 635 
acrosin inhibitor IIA, 626, 629, 631 
ADA regulatory protein, 632 
amplitude modulation, 619 
assignments, 621-28 
p structure, 631 
calmodulin, 624 
carbonate dehydratase, 635 
carrier frequency, 615 
chemical shift, 614 
connected ‘hydrogens, 630 
connections among nuclei, 622 
continuous wave nuclear magnetic 
spectrometers, 614 
correlated spectrum, 619 
coupling constant, 615 
CRINEPT, 621 
cytochrome c, 617 
cytochrome c, 626 
cytochrome Ga, 638 
dihedral angle, 616 
dihydrofolate reductase, 621-22, 
625 
DOP, 621 
dynamics, 623 
endo-1,4-ß-xylanase, 636 
epidermal growth factor, 636 
factor IX, 629 
Fourier transform nuclear 
magnetic resonance 
spectrometer, 614-15 
free induction decay, 615 
frequency labeling, 617 
frequency of maximum 
absorption, 614 


glutaredoxin 2, 630 

histidine, 635 

HMBC, 621 

HMQGC, 621 

HOHAHA, 621 

HSMQG, 621 

HSQC, 621 

hydrogen bond, 209 

immunity protein Im9, 639 

immunoglobulin G binding protein 
G, 633 

interleukin 1, 628 

interleukin 4, 629, 639 

interleukin 13, 627 

Larmor frequency, 613 

magnetic flux density, 613 

magnetogyric ratio, 613, 614 

major cold-shock protein, 630 

membrane-spanning o helices, 793 

molecular model, 631 

molten globule, 684 

MOF, 621 

myoglobin, 635 

net magnetization, 613 

nuclear Overhauser effect, 616 

nuclear Overhauser enhanced 
spectrum, 625-31 

nuclear spin, 613 

off-diagonal cross-peaks, 619 

order parameter, 623 

pancreatic trypsin inhibitor, 620 

phosphoglycerate mutase, 630 

proton exchange, 642 

PS, 621 

pyruvate dehydrogenase complex, 
638 

random meander, 632 

rate of relaxation, 614 

refinement, 632 

regulatory protein GAL4, 633 

relaxation, 614 

resonance, 613 

ribonuclease, 635, 687 

ribonuclease H, 636-37 

ribosomal protein S17, 629 

ring current, 617 

saturation, 614 

SBC, 621 

sequencing oligosaccharides, 136 

spin diffusion, 617 

spin quantum number, 613-614 

spin states, 613 

spin-spin coupling, 615 

subtilisin, 635 

three-dimensional spectroscopy, 
617-38 

threonine, 636 


time of mixing, 626 
TOCSY, 621 
transfer of saturation, 616 
transforming growth factor ßl, 627 
TROSY, 621 
tryptophan, 636 
two-dimensional spectroscopy, 
617-38 
water, 632 
nuclear magnetic resonance 
molecular model, 631 
nuclear Overhauser effect 
nuclear magnetic resonance, 616 
nuclear Overhauser enhanced 
spectrum 
nuclear magnetic resonance, 625-31 
nuclear spin 
nuclear magnetic resonance, 613 
P1 nuclease 
aligning crystallographic molecular 
models, 366 
nucleation 
assembly of actin, 730 
assembly of microtubules, 723 
nucleic acid structure 
base stacking, 323 
bulges, 323 
donors and acceptors of hydrogen 
bonds, 315 
double-helical hairpin of RNA, 322 
major groove, 315 
minor groove, 315 
pairs of bases, 314 
phosphoryl oxygens, 314 
tertiary structure, 323 
tetraloop, 322 
transfer RNA, 322-23 
nucleic acid, association of proteins 
with, 314-25 
arginine, 315 
BamHI site-specific 
deoxyribonuclease, 321 
catabolite gene activator protein, 
320 
conformational changes, 321 
deoxyribonuclease, 316 
DNA-(apurinic or apyrimidinic 
site) lyase, 320 
DNA polymerase ß, 315 
double helix of DNA, 314 
ETS-domain protein Elk-1, 316 
histones, 316, 320 
homeodomain protein MAT a2, 320 
lysine, 315 
MADS-box protein MCM1, 320 
methyl group of thymine, 318 
minor groove, 318 
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mismatch repair protein MutS, 321 
packing, 319 
phosphodiesters, 315 
positively charged amino acids, 315 
protein gp 45, 321 
mz systems of side chains, 322 
purine repressor, 320 
regulatory protein Cro, 315-16 
replication protein A, 322 
replication terminator, 316 
arc repressor, 316 
A repressor, 316 
met repressor, 316 
trp repressor, 319, 321 
repressor protein CI, 321 
ribonucleoproteins, 323 
ribosome, 324 
ring of protein, 321 
shape of the surface of the DNA, 319 
single-stranded DNA, 321-22 
single-stranded DNA binding 
protein, 322 
site-specific deoxyribonuclease, 318 
TATA-box-binding protein, 320 
telomere end-binding protein, 322 
topoisomerase I, 316, 321 
transcription factor IIIA, 324 
transcription factor AP-1, 316-17 
transcription factor AREA, 319 
transcription factor Rob, 317 
U1 small nuclear 
ribonucleoprotein, 324 
water, 317 
zinc finger, 324-25 
zinc finger protein GLIl, 324 
nucleolin 
heterologous associations, 516 
nucleoporin 
heterologous associations, 517 
nucleoside 5’-monophosphates 
sequencing DNA, 95 
nucleoside bases 
acids and bases, 65 
electronic structure, 65 
nucleoside-diphosphate kinase 
point group, 474 
nucleotide 
definition, 95 
nucleus, 743 
number concentration 
assembly of microtubules, 724 
of polymer, 724 
number of amino acids in a protein 
sieving, 424 
number of amino acids, estimation 
of 
electrophoresis, 427 


observed amplitudes 
crystallography, 173 
observed phases 
crystallography, 173 
octahedral point group 432, 487-88 
octahedral symmetry 
heat shock protein 16.5, 487-88 
D-octopine dehydrogenase 
kinetics of folding, 709 
octyl B-p-glucoside 
detergent, 770 
off-diagonal cross-peaks 
nuclear magnetic resonance, 619 
oleic acid, 745 
oligomer 
thermodynamics of folding, 670 
oligomeric integral membrane- 
bound proteins 
interfaces, 790 
oligomeric interfaces 
glycophorin, 790 
oligomeric protein 
definition, 407, 455 
oligomeric proteins, 466-85 
hydration, 577 
integral membrane-bound 
proteins, 786 
interfaces, 478 
point groups, 466 
oligosaccharides of glycoproteins, 
126-38 
ol-acid glycoprotein, 131 
branching, 128 
colonic mucin, 128, 130 
complex N-linked 
oligosaccharides, 131 
crystallography, 180 
9-5-deamino-5(S)-hydroxy- 
neuraminic acid, 128 
definition, 127 
glycoforms, 130 
glycopeptides, 133 
glycosidic linkage, 128 
N-glycosidic linkage, 128 
O-glycosidic linkage, 128 
high-mannose oligosaccharides, 130 
immunoglobulin D, 127 
microheterogeneity, 129 
d-neuraminic acid, 128 
O-linked oligosaccharides, 131 
phytohemagglutinin, 137 
proteoglycans, 132-33 
sequence of monosaccharides, 129, 
133 
sialic acids, 128 
thyroglobulin, 138 


O-linked oligosaccharides 
oligosaccharides on glycoproteins, 
131 
omit maps of difference electron 
density 
hydrogen bonds in crystallographic 
molecular models, 309 
refinement, 178 
open reading frame 
sequence of DNA, 106 
open structure 
definition, 455 
operon 
domains, 381 
opsin 
expression, 776 
optical constant 
scattering of electromagnetic 
radiation, 579 
optical rotation 
circular dichroism, 598 
optical rotatory dispersion, 598 
orbitals, 55 
order parameter 
bilayer of phospholipid, 756-57 
nuclear magnetic resonance, 623 
organic radical 
electron paramagnetic resonance, 
645 
orientation factor 
fluorescence resonance energy 
transfer, 606 
fluoresence resonance energy 
transfer, 607 
orientational freedom 
fluoresence resonance energy 
transfer, 607 
oriented bilayers 
phospholipid, 750 
oriented helical polymeric proteins 
X-ray diffraction, 502 
oriented, sealed vesicles 
topography of membrane- 
spanning proteins, 798 
ornithine carbamoyltransferase 
tetrahedral symmetry, 487 
ornithine decarboxylase 
molecular taxonomy, 396 
purification, 29 
orthogonal D sheets 
packing of ß structure, 285 
orthogonality, 61 
orthologues 
definition, 358 
evolution of proteins, 358 
orthorhombic lattice 
crystallography, 151 


osmotic pressure 
bovine serum albumin, 419 
chemical potential, 408 
concentration of the protein, 409 
Donnan effect, 410 
Donnan potential, 411 
electrolyte, 411 
hemoglobin, 420 
ideal gas law, 409 
impermeant solute, 408 
L-lactate dehydrogenase, 419 
B-lactoglobulin, 419 
molar mass, 408-11 
semipermeable membrane, 408 
serum albumin, 410 
virial coefficients, 409 
outer membrane, 743 
bacteria, 749 
outer membrane protein A 
crystallization, 772 
outer membrane protein F 
crystallization, 772, 775 
overexpression, 776 
outer membrane protein TolC 
crystallization, 772, 775 
ovalbumin 
aligning amino acid sequences, 360 
dodecyl sulfate gel electrophoresis, 
422 
electrophoresis, 38, 41-42 
frictional ratio, 426 
hydration of a protein, 298 
light scattering, 420 
sieving, 424, 427-28 
X-ray scattering, 583 
overexpression 
large-conductance mechano- 
sensitive channel, 776 
membrane-bound proteins, 776 
outer membrane protein F, 776 
overlap integral 
fluorescence resonance energy 
transfer, 606 
ovomucoid 
electrophoresis, 42 
ovomucoid inhibitor 
stereochemistry of side chains, 268 
ovotransferrin 
domains, 384 
sieving, 427 
4-oxalocrotonate tautomerase 
molecular rotational axes of 
pseudosymmetry, 477 
oxazolines 
posttranslational modification, 114 
oxidation levels 
cysteine, 80-82 


oxidative cleavage 
covalent modification, 544 
4-(oxoacetyl) phenoxyacetic acid 
reagent for covalent modification, 
539 
3-oxoacid CoA-transferase 
folding, 679 
3-oxoacyl-[acyl-carrier-protein] 
reductase 
domains, 381 
3-oxoacyl-[acyl-carrier-protein] 
synthase 
domains, 381 
oxoglutarate dehydrogenase 
(succinyl-transferring) 
mismatched symmetry, 511 
2-oxoglutarate dehydrogenase 
complex 
mismatched symmetry, 511 
oxygen 
quenching fluorescence, 603 
1-oxyl-2,2,5,5-tetramethylpyrrolin-3- 
yl group 
electron paramagnetic resonance, 
645 


P 
p120 GTPase activator 
domains, 386 
packing 
in space groups, 457 
nucleic acid, association of 
proteins with, 319 
packing of o helices 
carboxypeptidase A, 282 
ö-endotoxin CrylllA, 285 
integral membrane-bound 
proteins, 781 
photosynthetic reaction center, 782 
ribonucleoside-diphosphate 
reductase, 285 
packing of p sheets 
immunoglobulin, 285 
penicillopepsin, 287 
packing of p structure 
B barrel, 285 
B sheets, 285 
orthogonal D sheets, 285 
ribonucleoside-diphosphate 
reductase, 286 
packing of side chains, 277-90 
a helices, 279-85 
carboxypeptidase A, 279 
cavities, 289 
coiled coil of œ helices, 279-85 
compressibility, 278 
concanavalin A, 281 


elasticity, 289 
helical nets, 280 
histocompatibility antigen, 279 
immunoglobulin, 281 
interdigitation of side chains, 278 
L-lactate dehydrogenase, 288 
lysozyme, 289 
minimization of molecular 
volume, 278 
plastocyanin, 289 
superoxide dismutase, 281 
volume of a molecule of protein, 278 
pairs of bases 
nucleic acid structure, 314 
palindromic sequence 
local rotational axis of symmetry, 
467 
palmitic acid, 745 
palmitoleic acid, 745 
pancreatic trypsin inhibitor 
fluorescence resonance energy 
transfer, 608 
nuclear magnetic resonance, 620 
pantetheine-phosphate 
adenylyltransferase 
domains, 391 
(S)-pantolactone dehydrogenase 
assay, 18 
papain 
cleavage of polypeptide, 88 
covalent modification, 546, 551 
molecular taxonomy, 393-94, 398 
paper chromatography, 4 
parallel pathways 
kinetics of folding, 704 
paralogues 
definition, 358 
evolution of proteins, 358 
paramagnetic ion 
electron paramagnetic resonance, 
645 
partial molar volume 
calculation of, 197 
definition, 197 
partial specific volume 
sedimentation equilibrium, 412 
partition coefficient 
chromatography, 4 
concentration, units of, 198 
definition, 198 
partition coefficients between gas 
and water 
hydrophobic effect, 235-37 
parvalbumin 
aligning amino acid sequences, 360 
mean molar mass of an amino 
acid, 418 
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PDZ domains 
heterologous associations, 514 
penicillin amidase 
folding, 679 
penicillopepsin 
crystallographic molecular model, 
167-70 
hydrogen bonds in 
crystallographic molecular 
models, 309 
packing of p sheets, 287 
stereochemistry of side chains, 268 
water in crystallographic molecular 
models, 293, 295 
pentacoordinate zinc 
metalloproteins, 331 
pepsin 
electrophoresis, 42 
hydration of a protein, 298 
molar mass, 418 
sieving, 424 
pepsinogen 
ionic interactions in 
crystallographic molecular 
models, 303 
peptidases 
produce heterogeneity, 48 
peptide bonds, 74 
hydropathy, 242 
planarity, 251 
secondary structure, 251 
peptide maps, 432-35 
actin, 432 
ankyrin, 434 
chromatography, 433 
collagen type XIV, 434 
definition, 432 
2-dehydro-3-deoxy-phospho- 
gluconate aldolase, 437-38 
electron transfer flavoprotein, 
435-36 
glucose-6-phosphate isomerase, 
435, 437-38 
glutamate-tRNA ligase, 435 
hemoglobin, 432-33 
mass spectrometry, 433 
methylmalonyl-CoA 
carboxytransferase, 435 
Na'/K’-exchanging ATPase, 433 
nitrogenase, 435 
phosphoglycerate dehydrogenase, 
434 
tryptic digests, 432 
two or more polypeptides, 434 
tyrosines, 433 
peptide separation 
chromatography, 90 
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cytochrome c peroxidase, 91 
phosphoglycerate kinase, 90 
peptides 
circular dichroism, 601 
peptidylamidoglycolate lyase 
domains, 377 
peptidyl-Asp metalloendopeptidase 
cleavage of polypeptide, 88 
peptidylglycine monooxygenase 
domains, 377 
peptidylprolyl isomerases 
proline isomerization, 701 
percentage of identity 
aligning amino acid sequences, 351 
periodic acid 
sequencing oligosaccharides, 134, 
136 
peripheral membrane-bound 
proteins, 763-64 
actin, 764 
ankyrin, 821 
annexin, 764 
choline-phosphate cytidylyltrans- 
ferase, 764 
phospholipase C, 764 
protein kinase C, 764 
protein Z, 764 
prothrombin, 764 
spectrin, 764 
perlecan 
domains, 386 
peroxidase 
immunostaining, 566 
peroxiredoxin 
point group, 472, 475 
peroxisomes, 743 
cell fractionation, 744 
pH 
effect on assay, 13 
effect on covalent modification, 531 
effect on folding, 662 
effect on free energy of folding, 675 
effect on kinetics of folding, 703 
effect on proton exchange, 644 
phage display 
detection of heterologous 
associations, 515, 518 
phase 
molecular orbital, 56 
phase of the reflection 
crystallography, 154 
phase transitions 
biological membranes, 808 
phaseolin 
molecular rotational axes of 
pseudosymmetry, 477 
tetrahedral symmetry, 487 


phases 
image reconstruction, 790 
phenylalanine 
circular dichroism, 598 
electronic structure, 76 
ultraviolet absorption spectra, 601 
phosphate 
dpr molecular orbitals, 83 
electronic structure, 83 
phosphatidic acid 
phospholipid, 747 
phosphatidylcholine 
asymmetry of, 810 
phospholipid, 745 
phosphatidylethanolamine 
active transport, 809 
asymmetry of, 810 
phospholipid, 745 
phosphatidylglycerol 
asymmetry of, 810 
phospholipid, 749 
phosphatidylinositol 
asymmetry of, 810 
phospholipid, 747 
phosphatidylserine 
active transport, 809 
asymmetry of, 810 
cytoskeleton, 821 
phospholipid, 747 
phosphocarrier protein HPr 
free energy of folding, 677 
hydrogen bonds in 
crystallographic molecular 
models, 312 
phosphodiester backbone 
association of proteins with 
nucleic acid, 315 
nucleic acid structure, 314 
phosphoenolpyruvate carboxykinase 
(GTP) 
covalent modification, 543 
crystallization, 49 
6-phosphofructo-2-kinase 
aligning crystallographic molecular 
models, 364 
6-phosphofructo-2-kinase/fructose- 
2,6-bisphosphate 2-phosphatase 
domains, 389 
phosphofructokinase 
assay, 19 
molecular taxonomy, 395 
phosphoglycerate dehydrogenase 
molecular taxonomy, 396 
peptide map, 434 
purification, 49 
phosphoglycerate kinase 
domains, 383 


fluorescence, 603 
kinetics of folding, 690 
molecular taxonomy, 395 
peptide separation, 90 
purification, 24 
recurring structure, 373 
phosphoglycerate mutase 
assembly of oligomers, 710-11, 713 
molecular taxonomy, 395 
nuclear magnetic resonance, 630 
purification, 24 
phosphoinositide phospholipase Cöl 
domains, 386-87 
phospholipase 
aligning crystallographic molecular 
models, 372 
phospholipase C 
aligning crystallographic molecular 
models, 366 
peripheral membrane-bound 
proteins, 764 
phospholipase Cy 
domains, 386 
phospholipid scramblase 
assymetry of phospholipids, 809 
phospholipids 
asymmetric distribution of, 808-10 
archaebacterial isopranylether 
lipid, 748 
bilayer of lipids, 745 
diphosphatidylglycerol, 749 
ether linkage, 747 
flip flop, 809 
sn-glycerol 3-phosphate, 745 
head group, 747 
oriented bilayers, 750 
phosphatidic acid, 747 
phosphatidylcholine, 745 
phosphatidylethanolamine, 745 
phosphatidylglycerol, 749 
phosphatidylinositol, 747 
phosphatidylserine, 747 
plasmalogen, 747 
saturated fatty acids, 745 
sphingomyelin, 748 
translational diffusion coefficient, 814 
unsaturated fatty acids, 745 
phospholipid-translocating ATPase 
asymmetry of phospholipid, 809 
phosphomevalonate kinase 
assay, 17 
electrophoresis, 46 
phosphopyruvate hydratase 
aligning crystallographic molecular 
models, 366 
interface, 478 
molecular taxonomy, 397 


phosphorescence 
emission of light, 595 
1-(5-phosphoribosyl)-5-[(5-phospho- 
ribosylamino)methylidineamino] 
imidazole-4-carboxamide 
isomerase 
domains, 383 
phosphoribosylamine-glycine ligase 
aligning crystallographic molecular 
models, 362 
domains, 388, 390 
molecular taxonomy, 397 
phosphoribosylaminoimidazole- 
carboxamide formyltransferase 
metalloproteins, 328 
phosphoribosylanthranilate 
isomerase 
domains, 377, 384 
folding, 668, 679 
phosphoribosylformylglycinamidine 
cyclo-ligase 
domains, 390 
phosphoribosylformylglycinamidine 
synthase 
domains, 380 
phosphoribosylglycinamide 
formyltransferase 
domains, 390 
phosphoribulokinase 
point group, 472, 474 
phosphoryl oxygens 
nucleic acid structure, 314 
phosphorylase 
dodecyl sulfate gel electrophoresis, 
422 
mean molar mass of an amino 
acid, 418 
molar mass, 418 
molecular taxonomy, 395 
recurring structure, 373 
phosphorylase b 
space groups, 463 
phosphorylase kinase 
electron microscopy, 586-87 
phosphoserine, 82 
phosphoserine transaminase 
molecular taxonomy, 396 
3-phosphoshikimate 1-carboxy- 
vinyltransferase 
domains, 380, 382 
phosphothreonine, 82 
phosphotyrosine, 82 
phosvitin 
lipoproteins, 804 
photo-lyase 
fluoresence resonance energy 
transfer, 607 


photolytic reactions 
covalent modification, 541-44 
photosynthetic reaction center 
bound phospholipid, 784 
crystallization, 772 
crystallographic molecular model, 
780 
crystallography, 180-81 
electron nuclear double resonance, 
650 
hydrophobic sheath, 780 
membrane-bound proteins, 765 
membrane-spanning o helices, 
772, 784, 795 
packing of the o helices, 782 
rotational axis of pseudosymmetry, 
777, 787 
phthalate-dioxygenase reductase 
water in crystallographic molecular 
models, 293 
B-phycoerythrin 
crystallography, 181 
phylogenetic tree 
aligning amino acid sequences, 
354-55 
phytohemagglutinin 
oligosaccharides of glycoprotein, 
137 
a character of a lone pair, 63 
m helix 
arachidonate 15-lipoxygenase, 260 
secondary structure, 260 
m lone pair of electrons 
electronic structure, 59 
x molecular orbitals 
electronic structure, 56 
placental ribonuclease inhibitor 
domains, 384 
plane-polarized light 
circular dichroism, 597 
plaques 
cloning of DNA, 99 
plasma membrane 
cell fractionation, 744 
definition, 743 
eubacteria, 749 
fungi, 748 
lipid composition, 748 
plasmalogen 
phospholipid, 747 
plasmid 
cloning of DNA, 99 
plasminogen 
diffusion coefficient, 577-78 
domains, 388-89 
frictional coefficient, 577-78 
frictional ratio, 577 
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purification, 29 
sedimentation coefficient, 577-78 
plasminogen activator inhibitor 1 
molecular charge, 34 
plastocyanin 
aligning amino acid sequences, 360 
metalloproteins, 331 
packing of side chains, 289 
pleckstrin domain, 386 
point group 
acetylcholine-binding protein, 469 
K bungarotoxin, 466-67 
chloramphenicol O-acetyltrans- 
ferase, 468-69 
cyclic, 466 
definition, 466 
2,2-dialkylglycine decarboxylase 
(pyruvate), 470-71 
dihydrodipicolinate synthase, 475 
L-fuculose-phosphate aldolase, 469 
glutathione-disulfide reductase, 467 
heat-labile enterotoxin, 469 
hexokinase, 467 
IMP dehydrogenase, 469 
L-lactate dehydrogenase 
(cytochrome), 469-70 
nucleoside-diphosphate kinase, 474 
oligomeric proteins, 466 
peroxiredoxin, 472, 475 
phosphoribulokinase, 472, 474 
regular polyhedra, 486 
replicative DNA helicase, 469 
ribulose-phosphate 3-epimerase, 
472-73, 475 
L-ribulose-phosphate 4-epimerase, 
469 
serum amyloid P component, 469 
small nuclear ribonucleoprotein, 
470 
sulfate adenylyltransferase, 474 
superoxide dismutase, 475 
transcriptional activator NTRC1, 
470 
transitional endoplasmic reticulum 
ATPase, 469 
point group 11, 470 
point group 2, 466 
point group 222, 470-72 
dimer of dimers, 471 
point group 23, 486-87 
point group 3, 468-69 
point group 322, 472-73 
point group 4, 469 
point group 422, 472 
point group 432, 487-88 
point group 5, 469 
point group 522, 472 
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point group 532, 488 
point group 6, 469 
point group 7, 469 
point mutation 
evolution of proteins, 350 
polarized light 
light scattering, 415 
poliovirus 
epitopes, 561 
poly(ethylene glycol) precipitation 
purification, 23 
polyacrylamide gel 
electrophoresis, 41 
polyamide backbone 
posttranslational modification, 114 
polyclonal immunoglobulin, 557 
polyhedrosis virus 
expression of DNA, 109 
polymerase chain reaction 
cloning of DNA, 100 
polymeric protein 
definition, 407, 455 
polynucleotide 
sequencing of DNA, 96 
polynucleotide 5’-hydroxyl-kinase 
sequencing of DNA, 96 
polynucleotide 3’-phosphatase/ 
5’-kinase 
frictional ratio, 577 
polypeptide 
definition, 74 
polypeptide backbone 
hydropathy, 276 
polyproline helix 
benzoylformate decarboxylase, 259 
secondary structure, 259 
polysaccharides 
storage, 127 
structural, 126 
polyvalent antigen, 564 
porin 
crystallization, 772 
interfaces, 479 
space groups, 461, 463 
porin OmpF 
crystallographic molecular model, 
782 
water-filled channel, 778 
porphine, 61 
porphobilinogen synthase 
assembly of oligomers, 713 
porphyrin, 61 
positive selection 
evolution of proteins, 358 
positively charged amino acids 
nucleic acid, association of 
proteins with, 315 


posttranslational modification, 113-26 
N*-(B-N-acetylglucosaminy])- 
L-asparaginase, 114 
S-adenosylmethionine 
decarboxylase, 114 
amino terminus, 117 
aspartate 1-decarboxylase, 114 
carboxy terminus, 117 
coenzyme-b sulfoethylthiotrans- 
ferase, 113 
concanavalin A, 116 
cross-link, 119 
cystine, 122 
DNA polymerase, 115 
endopeptidases, 113 
galactose oxidase, 122 
geranylgeranylation, 117 
glutamate-ammonia ligase, 125 
glycosylphosphatidylinositol (GPI) 
anchor, 118 
hedgehog protein, 115 
hemocyanin, 122 
high-resolution mass spectrum, 119 
histidine ammonia-lyase, 126 
histidine decarboxylase, 114 
intein, 115 
isoprenylation, 117 
mass spectrometry, 119 
methylation, 115 
microsin, 114 
monophenol monooxygenase, 122 
N—O acyl migration, 114 
n-tetradecanoyl amide, 117 
oxazolines, 114 
polyamide backbone, 114 
proinsulin, 114 
pyroglutamate, 117 
N-2-pyruvylation, 117 
RecA protein, 115 
red fluorescent protein, 126 
self cleavage, 114 
signal sequences, 114 
thiazolines, 114 
vacuolar adenosinetriphosphatase, 
115 
potassium 
metalloproteins, 328 
potassium channel KcsA 
crystallization, 772 
crystallographic molecular model, 
779 
membrane-spanning helices, 772 
potassium thiocyanate 
preferential solvation, 22 
potential energy 
of hydrogen bond, 213 
refinement, 175 


precipitation 
cell fractionation, 744 
precipitation of proteins 
purification, 22 
preferential solvation, 22 
guanidinium, 22, 661 
hydration of a protein, 297 
lactose, 31 
potassium thiocyanate, 22 
sulfate, 22 
urea, 661 
preparative electrophoresis 
cloning of DNA, 101 
prepromagainin 
domains, 384 
primer 
polymerase, 97 
primordial proteins 
molecular taxonomy, 392 
probe 
cloning of DNA, 100 
procollagen-proline dioxygenase 
purification, 29 
progressive alignment 
aligning amino acid sequences, 
356-57 
proinsulin 
posttranslational modification, 114 
proline 
æ helix, 257 
electronic structure, 76 
C’-endo conformations, 270 
C’-exo conformations, 270 
membrane-spanning o helices, 783 
stereochemistry of side chains, 270 
proline isomerization 
cis peptide bond, 698 
collagen, 701 
cytochrome c, 702 
enthalpy of activation, 699 
immunoglobulin, 701 
kinetics of folding, 698-702 
nativelike conformations, 700 
peptidylprolyl isomerases, 701 
rate constants, 699, 701 
ribonuclease, 699-701 
ribonuclease T,, 700-01 
slowly folding isomers, 699 
T3 promoter 
expressing DNA, 108 
T7 promoter 
expressing DNA, 108 
tacll promoter 
expressing DNA, 109 
promoter-specific transcription 
factor Spl 
purification, 28 


prostaglandin-endoperoxide 
synthase 
membrane-bound proteins, 766 
protection factor 
proton exchange, 644 
protein 
infrared spectrum, 595 
protein 4.1 
cytoskeleton, 821 
protein A 
immunoadsorbent, 566 
kinetics of folding, 702 
protein CDC20 
yeast two-hybrid assay, 519 
protein coat of a virus, 488-98 
protein coat of adenovirus 
T = 25 icosahedral symmetry, 498 
protein coat of bacteriophage 6X174 
common ancestor, 495 
protein coat of bacteriophage MS2, 
496 
structural swapping, 480 
protein coat of bacteriophage P22 
T=7 icosahedral symmetry, 497 
protein coat of beanpod mottle virus 
common ancestor, 495 
quasi-equivalence, 492 
protein coat of black beetle 
nodavirus 
common ancestor, 495 
quasi-equivalence, 492 
protein coat of Bluetongue virus 
T= 13 icosahedral symmetry, 498 
protein coat of canine parvovirus 
common ancestor, 495 
protein coat of cowpea mosaic virus 
common ancestor, 495 
quasi-equivalence, 492 
protein coat of foot-and-mouth 
disease virus 
common ancestor, 495 
quasi-equivalence, 494 
protein coat of Hepatitis B virus, 610 
circular dichroism, 612 
sedimentation coefficient, 611 
sedimentation equilibrium, 610 
protein coat of herpes simplex virus 
T = 16 icosahedral symmetry, 498 
protein coat of Mengo virus 
common ancestor, 495 
quasi-equivalence, 494 
protein coat of Nudaurelia 
o Capensis virus 
T= 4 icosahedral symmetry, 496 
protein coat of poliovirus 
common ancestor, 495 
quasi-equivalence, 494 


protein coat of polyoma virus 
T= 7 icosahedral symmetry, 497 
protein coat of primate calcivirus 
quasi-equivalence, 492 
protein coat from R17 virus 
mean molar mass of an amino 
acid, 418 
protein coat of reovirus 
T = 13 icosahedral symmetry, 498 
protein coat of rhinovirus 
common ancestor, 495 
interfaces, 480 
quasi-equivalence, 494 
protein coat of satellite panicum 
mosaic virus 
common ancestor, 495 
icosahedral symmetry, 488-89 
interfaces, 489 
protein coat of satellite tobacco 
necrosis virus 
common ancestor, 495 
icosahedral symmetry, 494-95 
molecular taxonomy, 393 
protein coat of simian virus 40 
T=7 icosahedral symmetry, 497 
protein coat of Sindbis virus 
T= 4 icosahedral symmetry, 496 
protein coat of southern bean mosaic 
virus 
common ancestor, 495 
icosahedral symmetry, 494 
quasi-equivalent interfaces, 
492-94 
protein coat of tomato bushy stunt 
virus 
common ancestor, 495 
icosahedral symmetry, 494 
molecular taxonomy, 397 
octahedral symmetry, 492 
quasi-equivalence, 492 
protein coat of turnip yellow mosaic 
virus 
quasi-equivalence, 492 
protein disulfide-isomerase, 710 
cystine 
cystines, formation of, 125, 708-09 
reduction potential, 709 
protein G 
heterologous interfaces, 513 
kinetics of folding, 697 
protein geranylgeranyltransferase 
purification, 29 
protein gp 45 
nucleic acid, association of 
proteins with, 321 
protein HPr 
space groups, 465 
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Protein Information Resource 
aligning amino acid sequences, 354 
protein kinase C 
peripheral membrane-bound 
proteins, 764 
protein kinase N 
purification, 26 
protein L 
kinetics of folding, 696 
protein L9 
free energy of folding, 675 
kinetics of folding, 703 
protein MsbA 
crystallization, 772 
membrane-spanning helices, 772 
protein p11 
heterologous associations, 514 
protein phosphatase 2A 
evolution of proteins, 351 
protein S6 
kinetics of folding, 702 
protein Z 
peripheral membrane-bound 
proteins, 764 
protein-glutamine y-glutamyltrans- 
ferase 
assembly of fibrin, 721 
posttranslational modification, 
122 
protein-tyrosine kinases 
hydrogen bonds in crystallographic 
molecular models, 312 
in rafts, 811 
protein-tyrosine kinase ZAP-70 
domains, 390 
protein-tyrosine phosphatase 
domains, 381 
heterologous associations, 517 
proteoglycans 
oligosaccharides on glycoproteins, 
132-33 
proteolipid protein 
membrane-bound proteins, 765 
prothrombin 
diffusion coefficient, 578 
frictional coefficient, 578 
frictional ratio, 578 
peripheral membrane-bound 
proteins, 764 
sedimentation coefficient, 578 
protocatechuate 3,4-dioxygenase 
assay, 16 
axes of symmetry, 454 
protofibril 
assembly of fibrin, 720 
protofilament 
assembly of microtubules, 721 


886 Index 


protomer 
definition, 451 
proton exchange, 640-45 
a-amylase inhibitor HOE-467A, 644 
amido protons, 640 
basic trypsin inhibitor, 642-44 
buried hydrogen bonds, 641 
calmodulin, 645 
cooperative unfoldings, 645 
dihydrodipicolinate reductase, 645 
effect of pH, 644 
endopeptidolytic analysis, 642 
EX, limit, 643 
EX, limit, 643 
free energy of folding, 675 
hydrogen bonds, 641 
immunoglobulin, 645 
kinetic mechanism, 643 
kinetics of folding, 691-92, 698 
local vibrational modes, 645 
lysozyme, 645 
mass spectrometry, 642 
micrococcal nuclease, 645 
molten globule, 684 
myoglobin, 642 
[myosin-light-chain] kinase, 645 
neutron diffraction, 645 
nuclear magnetic resonance, 642 
protection factor, 644 
rate constants, 641 
ribonuclease, 645 
specific acid catalysis, 641 
specific base catalysis, 641 
trypsin, 645 
tryptophan, 640 
proto-oncogene protein c-fos 
heterologous associations, 519 
proto-oncogene protein-tyrosine 
kinase ABL1 
heterologous associations, 517 
PS 
nuclear magnetic resonance, 621 
pseudosymmetric, trimeric 
protomers 
quasi-equivalence, 492 
pseudosymmetry 
heterooligomers, 508 
purification of membrane-bound 
proteins 
nonionic detergents for, 768-71 
purification of peptides 
immunoadsorbents, 563 
purification of proteins, 20-32 
acetone powder, 23 
acetyl-CoA carboxylase, 49 
N-acetylgalactosaminidemucin-B 
1,3-galactosyltransferase, 29 


N-acetylglucosamine kinase, 29 

[acyl-carrier-protein] 
S-malonyltransferase, 46 

adenylate cyclase, 29 

a,-adrenergic receptor, 29 

B-adrenergic receptor, 29, 816 

adsorption chromatography, 25 

affinity adsorption, 26 

affinity elution, 26 

a-ketoisocaproate oxygenase, 25 

ammonium sulfate precipitation, 
23 

aryl-acylamidase, 21 

ATP diphosphatase, 773 

cathepsin D, 29 

choline O-acetyltransferase, 29 

coproporphyrinogen oxidase, 26 

cytochrome b;, 773 

N-deacetylheparin N-sulfotrans- 
ferase, 29 

2-dehydro-3-deoxyphospho- 
heptonate aldolase, 29 

3-deoxy-7-phosphoheptulonate 
synthase, 25 

dihydrofolate reductase, 29 

dipeptidyl-peptidase IV, 773 

DNA polymerase, 30 

dolichyl-phosphate B-d-mannosyl- 
transferase, 773 

enrichment, 21 

formate-tetrahydrofolate ligase, 26 

5-formyltetrahydrofolate cyclo- 
ligase, 28-9 

fructose 1,6-bisphosphatase, 26 

glutamyl-tRNA reductase, 26, 31 

glyceraldehyde-3-phosphate 
dehydrogenase, 24, 48 

hexokinase, 29 

HLA histocompatibility antigen, 
773 

HLA-linked B-cell antigen, 773 

homogenization, 20 

inhibitors of peptidases, 49 

ion exchange chromatography, 23 

isocitrate dehydrogenase, 26 

isocitrate dehydrogenase (NAD’), 
29 

isoelectric precipitation, 23 

L-lactate dehydrogenase, 29 

malate dehydrogenase 
(oxaloacetate-decarboxylating) 
(NADP*), 29 

malate synthase, 30 

membrane alanyl aminopeptidase, 
773 

membrane-bound proteins, 
768-73 


methylcrotonyl-CoA carboxylase, 25 

micrococcal nuclease, 26 

molecular exclusion 
chromatography, 24 

Na /K'-exchanging ATPase, 768 

ornithine decarboxylase, 29 

phosphoglycerate dehydrogenase, 
49 

phosphoglycerate kinase, 24 

phosphoglycerate mutase, 24 

plasminogen, 29 

poly(ethylene glycol) precipitation, 
23 


precipitation of proteins, 22 
procollagen-proline dioxygenase, 
29 
promoter-specific transcription 
factor Sp), 28 
protein geranylgeranyltransferase, 
29 
protein kinase N, 26 
specific activity, 21 
streptomycin sulfate precipitation, 
23 
sucrose a-glucosidase/ oligo-1,6- 
glucosidase, 773 
total activity, 21 
transketolase, 26 
trimethylamine oxide 
precipitation, 23 
UDP glucose 4-epimerase, 29 
unspecific monooxygenase, 773 
viral hemagglutinin, 773 
yield of activity, 21 
purine repressor 
nucleic acid, association of 
proteins with, 320 
purine-nucleoside phosphorylase 
space groups, 463 
pyridine 
electronic structure, 60 
pyroglutamate 
posttranslational modification, 117 
pyrrole 
electronic structure, 61 
pyruvate carboxylase 
assay, 17 
pyruvate decarboxylase 
molecular taxonomy, 396 
pyruvate dehydrogenase (acetyl- 
transferring) 
assembly of oligomers, 715 
pyruvate dehydrogenase complex 
assembly of oligomers, 715 
nonstoichiometric ratio of 
subunits, 512 
nuclear magnetic resonance, 638 


pyruvate kinase 
aligning crystallographic molecular 
models, 375 
domains, 382-83 
molecular taxonomy, 394-95, 397 
sieving, 427 
pyruvate oxidase 
domains, 385 
molecular rotational axes of 
pseudosymmetry, 477 
N-2-pyruvylation 
posttranslational modification, 117 


QAE cellulose 
chromatography, 9 
quadrupole mass spectrometer 
mass spectrometry, 91 
quantitative cross-linking, 443-45 
assembly of oligomers, 710-11, 713 
epidermal growth factor receptor, 
815 
integral membrane-bound 
proteins, 786 
quantum yield 
fluorescence, 602 
quasi-equivalence 
global rotational axis of symmetry, 
491 
icosahedral symmetry, 491-98 
interfaces, 492-94 
local 3-fold rotational axis of 
pseudosymmetry, 491 
protein coat ofbeanpod mottle 
virus, 492 
protein coat of black beetle 
nodavirus, 492 
protein coat ofcowpea mosaic 
virus, 492 
protein coat offoot-and-mouth 
disease virus, 494 
protein coat of Mengo virus, 494 
protein coat of poliovirus, 494 
protein coat of primate calcivirus, 
492 
protein coat of rhinovirus, 494 
protein coat of tomato bushy stunt 
virus, 492 
protein coat of turnip yellow 
mosaic virus, 492 
pseudosymmetric, trimeric 
protomers, 492 
quasi-equivalent interfaces 
protein coat of southern bean 
mosaic virus, 492-94 
quaternary structure 
acetylcholine receptor, 407 


chaperonin 60, 475 

closed structure, 454 

complementary faces, 455 

definition, 451 

3-deoxy-phosphogluconate 
aldolase, 407 

dihydrolipoyllysine residue 
(2-methylpropanoyl) 
transferase, 490 

dihydrolipoyllysine-residue 
acetyltransferase, 490 

dihydrolipoyllysine-residue 
succinyltransferase, 490 

DNA-directed RNA polymerase, 407 

evolution of, 455 

ferritin, 489-90 

glutamate-ammonia ligase, 475 

glycerate dehydrogenase, 483 

hemoglobin, 407 

histidine decarboxylase, 475 

HNRNP arginine methyltrans- 
ferase, 476 

IMP dehydrogenase, 475 

interface, 455 

L-lactate dehydrogenase, 407 

multifunctional endopeptidase, 475 

natural selection, 455 

oligomeric protein, 455 

open structure, 455 

polymeric protein, 455 

A repressor, 476 

ribonucleoside-diphosphate 
reductase, 475 

serum amyloid P component, 475 

small heat shock proteins, 490 

transketolase, 482 

quenching 
of fluorescence, 602 
serum album, 603 


R 
radial angle 
helical surface lattice, 500 
radial molecular correlation function 
water, 193-94 
radiationless decay, 602 
radius of gyration 
cyclic AMP-dependent protein 
kinase, 581 
cytochrome c, 581 
myoglobin, 581 
scattering of electromagnetic 
radiation, 580 
rafts 
caveolin, 811 
glycosphingolipids in, 811 
in biological membranes, 811 
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protein-tyrosine kinases in, 811 
saturated fatty acyl groups in, 811 
sphingomyelin in, 811 
Ramachandran plot 
deoxyribonuclease, 255 
glycines, 255 
regions of lowest energy, 254 
secondary structure, 254 
Raman effect 
absorption of light, 593 
Raman infrared spectrum, 595-96 
resonance Raman infrared 
spectrum, 596 
ribonuclease, 596 
secondary structure, 597 
random coil 
acid-base titration curve, 660 
amido proton exchange, 660 
circular dichroic spectrum, 660 
definition, 659 
intrinsic viscosity, 660 
ultraviolet spectrum, 660 
random meander 
circular dichroism, 599 
crystallographic molecular mode, 
170 
nuclear magnetic resonance, 632 
secondary structure, 264 
randomly coiled polypeptides 
sieving, 428 
Raoult’s law, 197 
rapid mixing chamber 
kinetics of folding, 688 
rate constant for folding 
kinetics of folding, 667 
rate constants 
proline isomerization, 699, 701 
proton exchange, 641 
rate of relaxation 
nuclear magnetic resonance, 614 
rate sedimentation 
cell fractionation, 743 
Rayleigh’s ratio 
light scattering, 416, 579 
reading frames, 98 
RecA protein 
fluorescence resonance energy 
transfer, 609 
posttranslational modification, 
115 
recent speciation 
aligning amino acid sequences, 
357 
receptor for the Fc domain of 
immunoglobulinG 
glycosylphosphatidylinositol- 
linked proteins, 765 
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receptor Tom 20 
anchored membrane-bound 
proteins, 764 
receptors 
assay, 15 
reconstitution 
assay, 773 
membrane-bound proteins, 771 
record of intolerance 
aligning amino acid sequences, 348 
recovery of fluorescence following 
photobleaching 
translational diffusion coefficient, 
813 
recurring domain 
domains, 382 
recurring structure 
aligning crystallographic molecular 
models, 373 
red fluorescent protein 
B barrel, 287 
posttranslational modification, 126 
refinement 
amino acid sequence, 178 
coenzymes, 180 
constraints, 174-75 
deoxyribonuclease, 176, 178 
global potential energy function, 
177 
image reconstruction, 793 
local minimum, 176 
molecular dynamics, 177 
nuclear magnetic resonance, 632 
omit maps of difference electron 
density, 178 
potential energy, 175 
simulated annealing, 177 
variant surface glycoprotein, 179 
water, 178 
refinement of crystallograpic 
molecular models, 172-85 
reflecting faces 
crystallography, 150 
refractive index 
light scattering, 415 
scattering of electromagnetic 
radiation, 579 
regular polyhedra 
point groups, 486 
regulatory kinases 
domains, 386 
regulatory protein Cro 
association of proteins with 
nucleic acid, 315-16 
regulatory protein GAL4 
nuclear magnetic resonance, 633 
yeast two-hybrid assay, 518 


rehybridization 
acids and bases, 63 
relative mobility 
definition, 4 
electrophoresis, 429 
relative permittivity 
ionic interactions in 
crystallographic molecular 
models, 303 
water, 190 
relaxation 
nuclear magnetic resonance, 614 
renin 
expressing DNA, 109 
replication protein A 
association of proteins with 
nucleic acid, 322 
replication terminator 
association of proteins with 
nucleic acid, 316 
replicative DNA helicase 
point group, 469 
reporter group 
ultraviolet absorption spectra, 601 
lac repressor 
interfaces, 480 
A repressor 
association of proteins with 
nucleic acid, 316 
covalent modification, 547 
kinetics of folding, 702-03 
quaternary structure, 476 
electrospray mass spectrometry, 417 
met repressor 
association of proteins with 
nucleic acid, 316 
rotational axes of symmetry, 468 
trp repressor 
association of proteins with 
nucleic acid, 319, 321 
repressor protein CI 
association of proteins with 
nucleic acid, 321 
resolution 
chromatography, 4 
crystallography, 158 
resonance 
nuclear magnetic resonance, 613 
resonance Raman infrared spectrum, 
596 
cytochrome d ubiquinol oxidase, 
596 
halocyanin, 596 
hemocyanin, 596 
resonance structures, 57 
restriction fragments 
sequencing of DNA, 96 


restriction mapping 
sequencing of DNA, 101 
restriction sites 
expressing DNA, 109 
sequencing of DNA, 96 
retardation coefficient 
definition, 41 
dodecyl sulfate gel electrophoresis, 
422 
electrophoresis, 426 
retinol-binding protein 
B barrel, 287 
domains, 384 
reverse-phase chromatography, 8 
reversibility 
folding, 663 
reversible dissociation 
interfaces, 471 
L-rhamnose 
structure, 129 
rhinovirus 
epitopes, 561 
rhodopsin 
boundary layer of phospholipid, 786 
covalent modification, 605 
crystallization, 772 
diffusion, 823 
fluorescence resonance energy 
transfer, 605 
membrane-spanning helices, 772 
rotational diffusion coefficient, 813 
size, 777 
topography of membrane- 
spanning proteins, 807 
translational diffusion coefficient, 
813 
Rho-GDP dissociation inhibitor 
fluorescence resonance energy 
transfer, 609 
rhombohedral lattice 
crystallography, 151 
ribokinase 
aligning crystallographic molecular 
models, 362 
molecular taxonomy, 396 
ribonuclease 
acid-base titration curve, 33 
aligning amino acid sequences, 361 
amino acid sequence, 85 
covalent modification, 550 
cross-linking, 548 
cystine, 124 
cystines, formation of, 708 
electrophoresis, 45 
equilibrium constant for folding, 687 
fluorescence, 601 
folding, 668, 678-679, 685 


free energy of folding, 676 
heterologous interfaces, 513 
hydration of a protein, 298-299 
hydrogen bonds in crystallographic 
molecular models, 310, 313 
infrared spectrum, 595 
kinetics of folding, 695 
molar mass, 418-419 
molecular charge, 35 
nuclear magnetic resonance, 635, 
687 
proline isomerization, 699-701 
proton exchange, 645 
Raman spectrum, 596 
sieving, 424 
thermodynamics of folding, 665, 
673, 682 
ribonuclease H 
folding, 678 
free energy of folding, 675-77 
kinetics of folding, 688-90, 693, 
697-98 
local conformational changes, 678 
ribonuclease inhibitor 
heterologous interfaces, 513 
ribonuclease T; 
crystallography, 184 
effect of pH on folding, 663 
fluorescence, 603 
free energy of folding, 675, 677 
hydropathy side chains, 275 
kinetics of folding, 694 
nuclear magnetic resonance, 
636-37 
proline isomerization, 700-01 
stereochemistry of side chains, 267 
water in crystallographic molecular 
models, 293 
ribonuclease U, 
water in crystallographic molecular 
models, 292 
(put this flush with left margin) 
ribonucleoproteins 
association of proteins with 
nucleic acid, 323 
ribonucleoside-diphosphate 
reductase 
electron nuclear double resonance, 
650 
electron paramagnetic resonance, 
645, 649 
molecular taxonomy, 397 
packing of o helices, 285 
packing of ß structure, 286 
quaternary structure, 475 
ribose-phosphate diphosphokinase 
assay, 18 


ribosomal protein S17 
nuclear magnetic resonance, 629 
ribosome 
assembly of oligomers, 715-17 
cross-linking, 549 
crystallography, 324 
electron microscopy, 588 
immunoelectron microscopy, 
567-69 
neutron scattering, 583-84 
nucleic acid, association of 
proteins with, 324 
30S subunit, 324 
50S subunit, 324 
ribulose-bisphosphate carboxylase 
covalent modification, 536, 546 
heterologous interfaces, 510 
hydrogen bonds in 
crystallographic molecular 
models, 306-7 
molecular rotational axes of 
symmetry, 465 
sieving, 427 
ribulose-phosphate 3-epimerase 
interfaces, 480 
point group, 472-73, 475 
L-ribulose-phosphate 4-epimerase 
point group, 469 
ring current 
nuclear magnetic resonance, 617 
RNA recognition motif, 386 
RNA-directed DNA polymerase 
cloning of DNA, 97 
sedimentation equilibrium, 413 
root mean square deviation 
aligning crystallographic molecular 
models, 362 
rose bengal 
reagent for covalent modification, 
536 
rotamer 
definition, 272 
stereochemistry of side chains, 272 
rotary shadowing 
electron microscopy, 585 
rotational axes of pseudosymmetry 
acetylcholine receptor, 787-88 
aquaporin, 788 
axes of symmetry, 452 
cytidine deaminase, 484 
double-helical DNA, 467 
hemocyanin, 485 
integral membrane-bound 
proteins, 789 
photosynthetic reaction center, 
777, 787 
rotational axes of symmetry, 452 
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bacteriorhodopsin, 787 
E2 DNA-binding domain, 468 
exo-a-sialidase, 787 
fibrin, 719 
gap junction connexon, 787 
glutathione synthase, 483 
integral membrane-bound 
proteins, 787 
methionine adenosyltransferase, 
480 
nitric-oxide synthase, 480 
arc repressor, 468 
met repressor, 468 
viral hemagglutinin viral 
hemagglutinin, 787 
rotational conformation 
stereochemistry of side chains, 267 
rotational correlation time 
water, 195 
rotational diffusion coefficient 
bacteriorhodopsin, 812 
decay in the anisotropy, 812 
in membranes, 812 
rhodopsin, 813 
rotational relaxation time 
molten globule, 684 
rough endoplasmic reticulum, 743 
cell fractionation, 744 
running gel 
electrophoresis, 44 
rusticyanin 
electrospray mass spectrometry, 417 
ryanodine receptor 
integral membrane-bound protein, 
766 


S 
salting in, 22 
salting out, 22 
SAND domain, 386 
domains, 386 
saponins 
detergents, 769 
saturated fatty acids 
phospholipid, 745 
saturated fatty acyl groups 
in rafts, 811 
saturation 
chromatograpy, 2-3 
nuclear magnetic resonance, 614 
SBC 
nuclear magnetic resonance, 621 
scales of hydropathy, 244 
from free energy of transfer, 274 
scanning calorimetry 
folding, 664 
thermodynamics of folding, 664 
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scattering at small angles 
hydration of a protein, 299 
scattering length 
hydrogen, 583 
neutron scattering, 583 
scattering of electromagnetic 
radiation, 579-85 
forward scattering, 579 
intramolecular interference, 579 
optical constant, 579 
radius of gyration, 580 
Rayleigh ratio, 579 
refractive index, 579 
scattering of X-radiation 
kinetics of folding, 691 
water, 193 
schistoside 
glycosphingolipid, 748 
screen libraries 
cloning of DNA, 99 
immunoglobulins, 567 
screw axis of symmetry, 452 
searching data banks of amino acid 
sequences 
aligning amino acid sequences, 
353-54 
secondary structure, 251-67 
a helix, 256-59 
amide I band, 596 
B-pleated sheet, 261 
B structure, 260-61 
B turn, 261-64 
cis peptide bonds, 251-52 
crystallography, 165-67 
deoxyribonuclease, 263 
dihedral angles d and y, 252-54 
yturn, 264 
infrared spectrum, 596 
peptide bond, 251 
helix, 260 
polyproline helix, 259 
Ramachandran plot, 254 
Raman infrared spectrum, 597 
random meander, 264 
sedimentation coefficient 
cell fractionation, 743 
protein coat of Hepatitis B virus, 
611 
sedimentation velocity, 576 
sedimentation equilibrium 
and sedimentation velocity, 413 
centrifugal potential, 411 
chemical potential, 411 
concentration of protein, 412 
equilibrium between oligomers, 
414 
gradient of concentration, 411 


molar mass, 411-14 
partial specific volume, 412 
protein coat of Hepatitis B virus, 610 
RNA-directed DNA polymerase, 413 
serum albumin, 412 
virial coefficients, 412 
sedimentation velocity, 576-77 
and sedimentation equilibrium, 413 
aspartate carbamoyltransferase, 576 
buoyant force, 576 
buoyant mass, 576 
desmin, 577 
frictional coefficient, 576-78 
frictional ratio, 577-78 
sedimentation coefficient, 576 
terminal velocity, 576 
selection rules 
infrared spectroscopy, 595 
selective adsorption 
chromatography, 3 
selenocysteine lyase 
assay, 19 
self cleavage 
posttranslational modification, 114 
self-charging energy 
ion, 200 
self-diffusion coefficient 
water, 194 
self-diffusion of water 
hydration of a protein, 297 
self-rotation function 
molecular rotational axes of 
symmetry, 465 
seminal ribonuclease 
covalent modification, 545 
semipermeable membrane 
osmotic pressure, 408 
separately unfolding domains, 388-89 
sequence of DNA 
acetylcholine receptor, 106-7 
Factor VIII, 106 
frameshift, 106 
genomic sequences, 108 
initiation codon, 106 
open reading frame, 106 
termination codon, 106 
sequence of monosaccharides 
oligosaccharides on glycoproteins, 
133 
sequencing of DNA 
acetylcholine receptor, 101-2 
antisense strand, 98 
blunt ends, 96 
chemical method, 103, 105 
cloning of DNA, 99 
2’,3’-dideoxynucleotide, 104 
electrophoresis, 101-3 


3’-end, 95 
5’-end, 95 
end-labeled fluorescent fragments, 
104 
end-labeled fragments, 103 
enzymatic method, 103-5 
ladder, 102 
nucleoside 5’-monophosphates, 
95 
polynucleotide 5’-hydroxyl-kinase, 
96 
polynucleotides, 96 
restriction fragments, 96 
restriction mapping, 101 
restriction sites, 96 
site-specific deoxyribonucleases, 
95-96 
sticky ends, 96 
sequencing of polypeptides, 85-95 
carboxypeptidase A, 91 
carboxypeptidase B, 91 
denaturing proteins, 87 
Edman degradation, 86 
endopeptidases, 87-88 
exopeptidases, 91 
leucyl aminopeptidase, 91 
mass spectrometry, 91-93 
serine-type carboxypeptidase, 91 
sequencing oligosaccharides 
B-N-acetylglucosamidase, 134 
anion exchange chromatography, 
134 
B-elimination, 133 
endo-1,4-ß-galactosidase, 135 
endo-a-sialidase, 135 
endoglycosidases, 133 
exo-0-2,3-sialidase, 134 
exo-a-sialidase, 134 
exoglycosidases, 134 
a-L-fucosidase, 134 
B-galactosidase, 134 
mass spectrometry sequencing 
oligosaccharides, 136 
methylation, 135 
nuclear magnetic resonance 
spectroscopy, 136 
periodic acid, 134, 136 
Smith degradation, 134, 136 
sodium borohydride, 135 
serine 
acid dissociation constant, 75 
electronic structure, 77 
stereochemistry of side chains, 
269 
serine peptidases, 49 
serine-type carboxypeptidase 
sequencing of polypeptides, 91 


serum albumin 
collisional quenching, 603 
diffusion coefficient, 578 
dodecyl sulfate gel electrophoresis, 
422 
domains, 384 
Donnan effect, 419 
electrophoresis, 42, 45 
frictional coefficient, 578 
frictional ratio, 578 
hydration of a protein, 298-99 
light scattering, 416 
mean molar mass of an amino 
acid, 418 
molar mass, 412, 418 
osmotic pressure, 410 
preferential solvation, 661 
sedimentation coefficient, 578 
sedimentation equilibrium, 412 
sieving, 424, 427-28 
unfolding, 661 
serum amyloid P component 
point group, 469 
quaternary structure, 475 
sex-lethal protein 
domains, 390 
SH2 domain, 386 
SH3 domain, 386 
Shaker S4 K channel 
immunoadsorbent, 566 
shape 
of a protein, 573-92 
shape of a protein 
electron microscopy, 585-88 
SHC transforming protein 
heterologous associations, 517 
shear 
viscosity, 578 
shell of hydration 
water in crystallographic molecular 
models, 296 
shikimate kinase 
domains, 380 
sialic acids 
oligosaccharides of glycoproteins, 
128 
structure, 129 
side chains of the amino acids, 74 
shape, 164 
sieving, 423-31 
a-amylase, 427-28 
B-amylase, 427 
apoferritin, 424, 427 
apparent surface area of a protein, 
423 
aspartate kinase-homoserine 
dehydrogenase, 427 


catalase, 424 
chymotrypsinogen, 424 


complexes between dodecyl sulfate 


and polypeptides, 429 
cytochrome c, 424, 428 
definition, 423 
electrophoresis, 426-30 
extended polymers, 428 
frictional ratio, 425 
fructose-bisphosphate aldolase, 

424, 427-28 
fumarate hydratase, 424 
B-galactosidase, 424 
glyceraldehyde-3-phosphate 

dehydrogenase, 424 
hemoglobin, 424, 428 
hexokinase, 427 
immunoglobulin G, 424, 428 
L-lactate dehydrogenase, 424, 427 
B-lactoglobulin, 428 
lactoperoxidase, 424 
lysozyme, 424 
malate dehydrogenase, 424, 428 
molecular exclusion 

chromatography, 423 
myoglobin, 424 
number of amino acids ina 

protein, 424 
ovalbumin, 424, 427-28 
ovotransferrin, 427 
pepsin, 424 
pyruvate kinase, 427 
randomly coiled polypeptides, 428 
ribonuclease, 424 
ribulose-bisphosphate 

carboxylase, 427 
serum albumin, 424, 427-28 
single-stranded nucleic acids, 

428-29 
standard proteins, 426 
Stokes radius, 426 
transferrin, 424, 427-28 
urease, 424, 427 
xanthine oxidase, 427 

o bonds, 59 
sigma factor rpoD 
covalent modification, 548 
o lone pair of electrons 
electronic structure, 59 
o-r stereochemical representation 
electronic structure, 56 
o structure, 59 
signal sequences 
anchored membrane-bound 

proteins, 764 
posttranslational modification, 

114 
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Simha factor 
viscosity, 579 
simulated annealing 
refinement, 177 
single-stranded DNA 
nucleic acid, association of 
proteins with, 321-22 
single-stranded DNA binding protein 
association of proteins with 
nucleic acid, 322 
single-stranded nucleic acids 
sieving, 428-29 
site-directed mutation, 110-11, 544 
cassettes, 111 
detection of heterologous 
associations, 519 
effect on free energy of folding, 675 
ionic interactions in crystallo- 
graphic molecular models, 303 
kinetics of folding, 703 
membrane-bound proteins, 773 
mutagenic oligonucleotide, 110 
topography of membrane- 
spanning proteins, 799 
tyrosyl-tRNA synthetase, 110 
unnatural amino acids, 111 
site-specific deoxyribonuclease 
association of proteins with 
nucleic acid, 318 
site-specific deoxyribonucleases 
sequencing DNA, 95-96 
skeletal representation 
crystallographic molecular mode, 
167 
sliding fluctuations 
bilayer of phospholipid, 752 
small heat shock proteins 
quaternary structure, 490 
small nuclear ribonucleoprotein 
point group, 470 
Smith degradation 
sequencing oligosaccharides, 134, 
136 
sn-glycerol 3-phosphate 
phospholipid, 745 
snorkeling 
membrane-bound proteins, 780 
sodium 
metalloproteins, 328 
sodium borohydride 
sequencing oligosaccharides, 135 
sodium/proton antiporter NhaA 
image reconstruction, 793 
softness 
metal ion, 327 
solution scattering 
X-ray scattering, 582 
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solution scattering curves, complete 
X-ray scattering, 582 
solution scattering curves, theoretical 
X-ray scattering, 582 
solvation 
definition, 189 
hydrogen bond, 208 
solvent flattening 
crystallography, 161 
somatotropin 
heterologous associations, 519 
somatotropin receptor 
heterologous associations, 519 
space group 
definition, 456 
space group C2, 458 
space group P2, 457 
space group P2,2,2), 460 
space group P3221, 462 
space groups, 456-65 
alcohol dehydrogenase, 463 
chloramphenicol O- 
acetyltransferase, 463 
crystallographic asymmetric unit, 
457 
crystallographic axis of symmetry, 
461 
cystathionine, 464 
deoxyribonuclease, 458 
designation of, 459 
dihydrolipoyllysine-residue 
acetyltransferase, 463 
dihydrolipoyllysine-residue 
succinyltransferase, 463 
exact rotational axis of symmetry, 
461 
ferredoxin, 464 
general control protein GCN4, 464 
glutathione peroxidase, 463 
glyceraldehyde-3-phosphate 
dehydrogenase 
(phosphorylating), 463 
(S)-2-hydroxy-acid oxidase, 463 
L-lactate dehydrogenase, 463 
lectin, 460 
malate dehydrogenase, 461 
mitochondrial H*-transporting 
two-sector ATPase, 461 
molecular axis of symmetry, 461 
packing in, 457 
phosphorylase b, 463 
porin, 461, 463 
protein HPr, 465 
purine-nucleoside phosphorylase, 
463 
sets of axes of symmetry, 457 
telokin, 462 


triose-phosphate isomerase, 463 
unit cell, 459 
space-filling representation 
crystallographic molecular mode, 
170 
speciation of organisms 
aligning amino acid sequences, 
354 
speciation of proteins 
molecular taxonomy, 393 
species of domains 
molecular taxonomy, 393 
specific acid catalysis 
proton exchange, 641 
specific activity 
definition, 21 
specific base catalysis 
proton exchange, 641 
specific viscosity 
definition, 578 
specificity 
iodoacetamide, 532 
spectrin 
cytoskeleton, 820 
domains, 384-85, 390 
peripheral membrane-bound 
proteins, 764 
spermadhesin 
X-ray scattering, 583 
spheroplasts, 744 
sphingomyelin 
in rafts, 811 
phospholipid, 748 
sphingosine, 748 
spider dragline silk 
amino acid sequence, 106 
spin diffusion 
nuclear magnetic resonance, 617 
spin quantum number 
electron paramagnetic resonance, 
646 
nuclear magnetic resonance, 
613-14 
spin states 
nuclear magnetic resonance, 613 
spin-labeled phospholipids 
integral membrane bound 
proteins, 795 
spin-labeled probes 
biological membranes, 808 
spin-spin coupling 
electron paramagnetic resonance, 
647 
nuclear magnetic resonance, 615 
splicing of messenger RNA 
evolution of proteins, 350 
spongiform encephalopathy, 508 


SSEARCH, 354 
evaluation of, 368 
stability of a protein 
hydrogen bonds in crystallographic 
molecular models, 311 
stable intermediates 
molten globules, 683 
stable moving boundaries 
electrophoresis, 42 
stacking 
dodecyl sulfate gel electrophoresis, 
422 
electrophoresis, 43 
stacking gel 
electrophoresis, 43 
staggered conformation 
stereochemistry of side chains, 267 
stain for enzymatic activity 
electrophoresis, 46 
standard enthalpy of formation 
ion pair, 200 
standard free energy of solvation, 198 
standard free energy of transfer 
concentration, units of, 198 
definition, 198 
standard proteins 
sieving, 426 
standard states, 196-99 
entropy of mixing, 196 
START domain, 386 
start site 
evolution of proteins, 350 
stationary phase 
definition, 2 
statistical significance 
aligning amino acid sequences, 353 
stearic acid, 745 
stereochemistry of side chains, 267-72 
alternative conformations, 267 
a-lytic endopeptidase, 268 
aromatic amino acids, 269 
asparagine, 270 
aspartate, 270 
carboxypeptidase C, 271 
cystine, 271 
dihedral angle 7,, 267 
dihedral angle 72, 269 
glutamate, 270 
glutamine, 270 
glutathione reductase, 267 
hemoglobin, 267 
isoleucine, 268 
methionines, 270 
methyl group, 270 
ovomucoid inhibitor, 268 
penicillopepsin, 268 
proline, 270 


ribonuclease T}, 267 
rotamer, 272 
rotational conformation, 267 
serine, 269 
staggered conformation, 267 
streptogrisin A, 268 
streptogrisin B, 268 
threonine, 268 
valine, 267 
steric crowding 
assembly of oligomers, 715 
steric effects 
hydrogen bonds in crystallographic 
molecular models, 309 
steric exclusion 
definition, 510 
heterooligomers, 510 
immunoglobulin £ receptor, 510 
transthyretin, 510-11 
steric repulsion 
bilayers of phospholipid, 751 
steroid A-isomerase 
assembly of oligomers, 712 
fluorescence resonance energy 
transfer, 607 
sterol carrier protein 
membrane-bound proteins, 766 
sticky ends 
sequencing of DNA, 96 
stoichiometric ratio of subunits 
cross-linking, 445 
stoichiometry of the subunits 
definition, 407 
Stokes’ radius, 38 
definition, 37 
sieving, 426 
stop site 
evolution of proteins, 350 
stopped-flow apparatus 
dead time, 688 
kinetics of folding, 688 
streptococcal protein G 
hydrogen bonds in crystallographic 
molecular models, 312 
streptogrisin A 
stereochemistry of side chains, 268 
streptogrisin B 
stereochemistry of side chains, 268 
streptomycin sulfate precipitation 
purification, 23 
stretching frequency 
hydrogen bond, 209 
water, 195 
string of spherical beads 
frictional coefficient, 576 
strongest possible hydrogen bond, 212 
structural alignment, 366 


structural domain 
domains, 387 
structural swapping, 480-81 
structure factor 
crystallography, 155 
definition, 155 
subtilisin 
aligning amino acid sequence, 360 
circular dichroism, 600 
folding, 679 
ionic interactions, 300 
molecular taxonomy, 395 
nuclear magnetic resonance, 635 
subunit 
definition, 407 
succinate dehydrogenase 
crystallization, 772 
membrane-spanning helices, 772 
membrane-spanning segments, 777 
succinate-CoA ligase 
molecular taxonomy, 395 
succinate-CoA ligase (ADP-forming) 
cross-linking, 445 
succinate-propionate CoA- 
transferase 
aligning amino acid sequences, 
360 
succinic anhydride 
reagent for covalent modification, 
535 
succinyldiaminopimelate 
transaminase 
assay, 20 
sucrose a-glucosidase/oligo-1,6- 
glucosidase 
embedded anchor, 774 
purification, 773 
sucrose porin 
crystallization, 772 
sulfate 
electronic structure, 82 
preferential solvation, 22 
sulfate adenylyltransferase 
point group, 474 
sulfate-binding protein 
hydrogen bonds in 
crystallographic molecular 
models, 309 
sulfenic acid 
cysteine, 81-82 
metalloproteins, 330 
sulfenyl halides 
reagents for covalent modification, 
538 
sulfhydryl peptidases, 49 
sulfinic acid 
cysteine, 81-82 
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metalloproteins, 330 
sulfite oxidase 
domains, 377 
sulfite reductase 
molecular rotational axes of 
pseudosymmetry, 476 
N-(2-sulfoethyl) cyclohexylamine 
buffer, 68 
N-(2-sulfoethyl)morpholine 
buffer, 68 
sulfonate 
cysteine, 81-82 
sulfone 
methionine, 81-82 
N-(3-sulfopropyl)-2-amino-1,3-di- 
hydroxy-2-hydroxymethyl- 
propane 
buffer, 68 
N-(3-sulfopropyl) morpholine 
buffer, 68 
sulfoxide 
methionine, 81-82 
sulfur 
hydrogen bond, 207 
hydropathy, 241 
superfamily 
molecular taxonomy, 396 
superoxide dismutase 
aligning crystallographic molecular 
models, 363 
convergent evolution, 373 
hydrogen bonds in 
crystallographic molecular 
models, 306 
interfaces, 480 
packing of side chains, 281 
point group, 475 
X-ray scattering, 583 
superposed o carbons 
molecular rotational axes of 
symmetry, 465 
superposition 
aligning crystallographic molecular 
models, 362 
definition, 362 
surface area 
hydrophobic effect, 238 
surface area at zero pressure 
monolayer of lipids, 761 
surface potential 
bilayer of phospholipid, 754 
surface pressure 
monolayer of lipids, 761 
surface tension 
water, 190 
Swiss-Prot Sequence Database 
aligning amino acid sequences, 354 
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symmetric hydrogen bond, 212 
symmetry operations 

axes of symmetry, 451 
synapsin la 

molecular taxonomy, 397 
synaptobrevin-II 

heterologous interfaces, 513 
synaptotagmin 

heterologous associations, 516 
synchrotron 

crystallography, 156 
syn lone pair, 69 
syntaxin-1A 

heterologous interfaces, 513 
synthetic hydrophobic peptide 

membrane-spanning o helices, 774 
synthetic peptide 

antigen, 562-64 
synthetic peptides 

coiled coil of a helices, 284 
systemic amyloidosis, 508 


T 
T=3 icosahedral symmetry, 496 
T = 4 icosahedral symmetry, 496 
protein coat of Nudaurelia 
æ Capensis virus, 496 
protein coat of Sindbis virus, 496 
T=7 icosahedral symmetry, 497 
protein coat of bacteriophage P22, 
497 
protein coat of polyoma virus, 497 
protein coat of simian virus 40, 497 
T= 13 icosahedral symmetry, 498 
protein coat of Bluetongue virus, 
498 
protein coat of reovirus, 498 
T= 16 icosahedral symmetry, 498 
protein coat of herpes simplex 
virus, 498 
T = 25 icosahedral symmetry, 498 
protein coat of adenovirus, 498 
tandem mass spectrometer 
mass spectrometry, 93 
tartronate-semialdehyde synthase 
aligning amino acid sequence, 360 
TATA-box-binding protein 
nucleic acid, association of proteins 
with, 320 
tautomer 
definition, 69 
tautomeric equilibrium constants, 71 
tautomeric interactions 
ionic interactions in 
crystallographic molecular 
models, 300 
tautomers, 69-74 


B-xylanase, 73 
conformational isomers, 69 
equilibrium constants, 71 
ethyl acetoacetate, 70 
heterocycle, 73 
histidine, 78 
keto-enol, 70 
macroscopic acid dissociation 
constant, 71 
thioredoxin, 72-73 
titration curve, 73 
uridine, 69 
taxonomic system of the proteins, 393 
T-cell receptor 
heterologous associations, 513 
telokin 
space groups, 462 
telomere end-binding protein 
association of proteins with 
nucleic acid, 322 
temperature 
effect on folding, 662 
temperature jump 
kinetics of folding, 695 
temperature of maximum stability 
thermodynamics of folding, 672 
template 
polymerase, 97 
terminal velocity 
electrophoresis, 37 
sedimentation velocity, 576 
termination codon 
sequence of DNA, 106 
tertiary structure 
nucleic acid structure, 323 
tetracycline repressor 
covalent modification, 544 
tetragonal lattice 
crystallography, 151 
tetrahedral point group 23, 486-87 
tetrahedral symmetry 
bromoperoxidase, 487 
3-dehydroquinate dehydratase, 487 
hexamers of dimers, 487 
ornithine carbamoyltransferase, 
487 
phaseolin, 487 
tetramers of trimers, 487 
3,4,5,6-tetrahydrophthalic anhydride 
reagent for covalent modification, 
536 
2,3,4,5-tetrahydropyridine-2,6- 
dicarboxylate N- 
succinyltransferase 
B helix, 263 
tetraloop 
nucleic acid structure, 322 


tetramers of trimers 
tetrahedral symmetry, 487 
tetranitromethane, 547 
reagent for covalent modification, 
538 
theoretical plate 
chromatography, 4-5 
thermitase 
metalloproteins, 329 
thermodynamics of folding, 659-88 
calorimeter, 671 
chymotrypsinogen, 673 
compressibility of folding, 673 
condensation model, 683 
configurational entropy, 681-82 
cystine, 681 
enthalpy of folding, 671 
equilibrium constant for folding, 
664 
excluded volume, 681 
free energy of folding, 673 
heat capacity change of folding, 671 
immunoglobulin G, 682 
a-lactalbumin, 682 
B-lactoglobulin, 671 
lysozyme, 682 
myoglobin, 673 
oligomer, 670 
ribonuclease, 665, 673, 682 
scanning calorimetry, 664 
temperature of maximum stability, 
672 
two-state assumption, 665 
volume change of folding, 673 
thermolysin 
cleavage of polypeptide, 88 
domains, 389 
hydrogen bonds in crystallographic 
molecular models, 311 
thiamin pyridinylase 
aligning crystallographic molecular 
models, 363 
molecular taxonomy, 396 
thiazolines 
posttranslational modification, 114 
thick filament of myosin 
bare zone, 731-32 
helical surface lattice, 731-32 
length, 732 
structure, 731 
thickness of the double layer, 39 
thin filament 
actin, 506 
assembly of actin, 729 
thin-layer chromatography, 4 
thiocyanate 
mild denaturant, 713 


thiol:disulfide interchange protein 
cystines, formation of, 708 
reduction potential, 709 
thiol:disulfide interchange protein 
dsbA 
circular permutation, 680 
thioredoxin 
cystine, 125 
free energy of folding, 675, 677 
molecular taxonomy, 395 
sequencing by mass spectoscopy, 
92 
tautomers, 72 
water in crystallographic molecular 
models, 293 
thioredoxin-disulfide reductase 
domains, 388 
molecular taxonomy, 393 
thiosulfate sulfurtransferase 
domains, 383 
frictional coefficient, 588 
molecular rotational axes of 
pseudosymmetry, 476-77 
molecular taxonomy, 395 
2-S-(!*C]thiuroniumethanesulfonate 
impermeant reagent for covalent 
modification, 799 
three-dimensional spectroscopy 
nuclear magnetic resonance, 
617-38 
threonine 
acid dissociation constant, 75 
electronic structure, 77 
nuclear magnetic resonance, 636 
stereochemistry of side chains, 268 
water in crystallographic molecular 
models, 296 
a-thrombin 
heterologous associations, 513 
metalloproteins, 328 
thrombomodulin 
cystine, 125 
heterologous associations, 513 
thrombospondin I 
domains, 386 
thymidine, 95 
thymidylate synthase 
domains, 380 
thyroglobulin 
oligosaccharides of glycoproteins, 
138 
time of mixing 
nuclear magnetic resonance, 626 
time-of-flight mass spectrometer 
mass spectrometry, 91 
titin 
domains, 385 


heterologous associations, 513 
length, 85 
titration curve 
acetic acid, 67 
acids and bases, 66 
histidine, 79 
ionic interactions in crystallographic 
molecular models, 300 
tautomers, 73 
tobacco mosaic virus 
helical surface lattice, 499-500 
X-ray diffraction, 502 
TOCSY 
nuclear magnetic resonance, 621 
tonofilaments 
intermediate filaments, 506 
topography of membrane-spanning 
proteins, 798-803 
acetylcholine receptor, 802 
ADP, ATP carrier, 799 
alkaline phosphatase, 800 
band 3 anion transport protein, 
800, 801-2 
Ca’*-transporting ATPase, 802-3 
chloramphenicol O- 
acetyltransferase, 800 
cytochrome-c oxidase, 802 
definition, 798 
endopeptidases, 800 
glutamine y-glutamyltransferase, 799 
glycosylation, 800 
H*/K*-exchanging ATPase, 802-3 
H*-exchanging ATPase, 802-3 
immunoglobulins G, 800 
impermeant reagents, 798 
intact cells, 798 
K’-transporting ATPase, 802-3 
B-lactamase, 800 
lactoperoxidase, 799 
lactose permease, 800 
MalG protein, 800 
mitochondria, 798 
MotA protein, 799 
Na'/K'-exchanging ATPase, 799, 802 
oriented, sealed vesicles, 798 
rhodopsin, 807 
site-directed mutation, 799 
topoisomerasel 
association of proteins with 
nucleic acid, 316, 321 
total activity, 21 
transcription factor IIIA 
nucleic acid, association of 
proteins with, 324 
transcription factor AP-1 
association of proteins with 
nucleic acid, 316-17 
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fluorescence resonance energy 
transfer, 609 
heterologous associations, 519 
transcription factor AREA 
nucleic acid, association of 
proteins with, 319 
transcription factor Rob 
nucleic acid, association of 
proteins with, 317 
transcription initiation factor TFIID 
heterologous associations, 517 
transcriptional activator NTRC1 
point group, 470 
transfer between water and the gas 
hydropathy, 241 
transfer of saturation 
nuclear magnetic resonance, 616 
transfer RNA 
nucleic acid structure, 322-23 
transferrin 
hydrogen bonds in crystallographic 
molecular models, 312 
sieving, 424, 427-28 
transforming growth factor DI 
nuclear magnetic resonance, 
627 
transitional endoplasmic reticulum 
ATPase 
point group, 469 
transketolase 
purification, 26 
quaternary structure, 482 
translational diffusion coefficient, 
811 
acetylcholine receptor, 813 
bacteriorhodopsin, 813 
cytochrome-c reductase, 814 
endoplasmic reticulum Ca’*- 
transporting ATPase, 813 
in membranes, 812 
phospholipids, 814 
recovery of fluorescence following 
photobleaching, 813 
rhodopsin, 813 
ubiquinol-cytochrome-c 
reductase, 814 
transthyretin 
steric exclusion, 510-11 
trc promoter 
expressing DNA, 109 
triacylglycerol lipase 
assay, 16 
water in crystallographic molecular 
models, 294 
trifluoroacetic anhydride 
reagent for covalent modification, 
535 


896 Index 


3- (trifluoromethyl) -3-(m- 
(**T]iodophenyl) diazirine 
hydrophobic reagent for covalent 
modification, 797 
trimethylamine oxide precipitation 
purification, 23 
trimethylamine-N-oxide reductase 
(cytochrome c) 
multiple isomorphous 
replacement, 160 
2,4,6-trinitrobenzenesulfonate, 546 
reagent for covalent modification, 
536 
triose-phosphate isomerase 
domains, 382 
molecular rotational axes of 
symmetry, 465 
molecular taxonomy, 394 
space groups, 463 
tripeptidyl-peptidase II 
aligning amino acid sequence, 360 
triple helix 
collagen, 504 
tris(2-carboxyethyl) phosphine 
cystine, 125 
1-tritiospiro[adamantane-4,3’- 
diazirine] 
hydrophobic reagent for covalent 
modification, 797 
Triton X-100 
detergent, 770 
Tritons 
detergents, 769 
tRNA-intron endonuclease 
helical polymer, 455 
tropomyosin 
viscosity, 589 
troponin 
fluorescence resonance energy 
transfer, 609 
troponin C 
aligning amino acid sequences, 360 
heterologous associations, 514 
hydrogen bonds in 
crystallographic molecular 
models, 312 
X-ray scattering, 583 
troponin I 
heterologous associations, 514 
TROSY 
nuclear magnetic resonance, 621 
trypsin 
cleavage of polypeptide, 88 
electrophoresis, 41, 45 
hydrogen bonds in 
crystallographic molecular 
models, 309 


molecular taxonomy, 396 
proton exchange, 645 
trypsinogen 
aligning amino acid sequences, 360 
tryptase 
aligning crystallographic molecular 
models, 362-63 
tryptic digests 
peptide map, 432 
tryptophan 
acid dissociation constant, 75 
covalent modification, 538-39 
electronic structure, 76 
fluorescence, 601 
free energy of transfer, 276 
hydrogen bonds in crystallographic 
molecular models, 308 
membrane-spanning o helices, 774 
nuclear magnetic resonance, 636 
proton exchange, 640 
ultraviolet absorption spectra, 601 
tryptophan synthase 
assembly of oligomers, 714 
folding, 685 
tryptophanase 
molecular charge, 34 
tryptophans 
circular dichroism, 598 
tryptophan-tRNA ligase 
assay, 14 
tube of electron density 
crystallography, 162 
tubulin 
assembly of microtubules, 722 
evolution of proteins, 351 
image reconstruction, 502 
protomer of microtubule, 722 
structure, 722 
X-ray scattering, 583 
tumor necrosis factor 
molecular taxonomy, 393 
tumor necrosis factor receptor- 
associated factor 2 
heterologous associations, 514 
tungsten 
metalloproteins, 330 
two-dimensional crystalline array 
image reconstruction, 790 
two-dimensional gel electrophoresis, 
567-68 
two-dimensional solution 
biological membranes, 811 
two-dimensional spectroscopy 
nuclear magnetic resonance, 
617-38 
two-state assumption 
thermodynamics of folding, 665 


tyrosine 
acid dissociation constant, 75 
circular dichroism, 598 
covalent modification, 536-38 
electronic structure, 77 
free energy of transfer, 276 
membrane-spanning o helices, 
774 
peptide map, 433 
ultraviolet absorption spectra, 601 
water in crystallographic molecular 
models, 296 
tyrosine phenol-lyase 
molecular taxonomy, 396 
tyrosyl-tRNA synthetase 
site-directed mutation, 110 


U 
U, small nuclear ribonucleoprotein 
nucleic acid, association of proteins 
with, 324 
ubiquinol-cytochrome-c reductase 
bound phospholipid, 784 
crystallization, 772, 775 
crystallographic molecular model, 
781 
electrospray mass spectrometry, 417 
membrane-spanning helices, 772, 
777 
short subunits, 777 
size, 777 
translational diffusion coefficient, 
814 
ubiquitin 
evolution of proteins, 351 
expressing DNA, 109 
kinetics of folding, 702 
UDP glucose 4-epimerase 
purification, 29 
UDP-glucose 6-dehydrogenase 
domains, 384 
UDP-N-acetylglucosamine 
1-carboxyvinyltransferase 
covalent modification, 546 
ultrasound 
molten globule, 684 
ultraviolet absorption spectra 
absorption of light, 601 
cystine, 601 
nitrotyrosine, 601 
phenylalanine, 601 
random coil, 660 
reporter group, 601 
tryptophan, 601 
tyrosine, 601 
ultraviolet light 
absorption of, 594 


unfolded polypeptides 
dodecyl sulfate gel electrophoresis, 
421 
unfolded state 
definition, 659 
unfrozen water 
hydration of a protein, 297 
unit cell 
crystallography, 150 
definition, 150 
space groups, 459 
unnatural amino acids 
site-directed mutation, 111 
unrefined map of electron density 
crystallography, 161 
unsaturated fatty acids 
phospholipid, 745 
unspecific monooxygenase 
expression, 776 
membrane-spanning o helices, 796 
purification, 773 
uracil 
electronic structure, 65 
urate oxidase 
interfaces, 480 
urea 
denaturant, 660 
free energies of transfer, 660 
preferential solvation, 661 
urease 
frictional ratio, 426 
metalloproteins, 330 
sieving, 424 
uridine 
tautomers, 69 
UTP-hexose-1-phosphate 
uridylyltransferase 
metalloproteins, 330-31 


V 
vacuolar H*-transporting two-sector 
ATPase 
aligning amino acid sequence, 360 
posttranslational modification, 115 
valence electrons 
electronic structure, 56 
valine 
electronic structure, 76 
stereochemistry of side chains, 267 
van der Waals forces 
hydrophobic effect, 235-37 
van der Waals radius 
definition, 277 
values for, 277 
vanadate 
reagent for covalent modification, 
544 


vanadium 
metalloproteins, 330 
vapor 
water, 190 
variant surface glycoprotein 
glycosylphosphatidylinositol- 
linked proteins, 765 
interfaces, 480 
refinement, 179 
vector equations 
multiple isomorphous replacement, 
159 
vectorial insertion 
band 3 anion transport protein, 
767 
membrane-bound proteins, 767 
of glycolipids into membranes, 767 
of glycoproteins into membranes, 
767 
very low density lipoprotein 
lipoproteins, 804 
vesicles of phospholipid 
large, unilamellar, 749 
multibilayer, 749-50 
small, unilamellar, 749 
vibrational energy levels 
absorption of light, 592-95 
villin 
assembly of actin, 730 
vimentin 
aligning amino acid sequences, 360 
vimentin filaments 
distribution in cell, 507 
intermediate filaments, 506 
vinculin 
assembly of actin, 730 
frictional ratio, 577 
2-vinylpyridine 
reagent for covalent modification, 
537 
viral hemagglutinin 
embedded anchor, 774 
purification, 773 
rotational axes of symmetry, 787 
viral protein coats 
electron microscopy, 588 
virial coefficients 
light scattering, 416 
lysozyme, 419 
osmotic pressure, 409 
sedimentation equilibrium, 412 
viruses 
immune complexes, 561 
viscosity, 578-79 
effect on kinetics of folding, 697, 703 
laminar flow, 578 
of membrane, 813 
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shear, 578 

Simha factor, 579 

tropomyosin, 589 

water, 193, 194 
visible light 

absorption of, 594 
vitellogenin 

amino acid sequence, 106 

lipoproteins, 804 
vitronectin 

heterologous associations, 516 
void volume 

definition, 4 
voltage-gated chloride channel 

immunoadsorbent, 566 
volume 

of hydrodynamic particle, 573 
volume change of folding 

thermodynamics of folding, 673 
volume fraction 

units of concentration, 196 
volume of a molecule of protein 

packing of side chains, 278 


W 
water, 190-96 
a helix, 256 


boiling point, 190 

coating molecule of protein, 573 

compressibility, isothermal, 192 

configurational heat capacity, 192 

cubic expansion coefficient, 193 

dielectric relaxation, 195 

dimers of water, 190 

effect on hydrogen bond, 216-20 

enthalpy of fusion, 190 

enthalpy of vaporization, 190 

heat capacity, 191 

hydrogen bond, 190 

hydrogen-bonded nearest 
neighbors, 193 

ice Ih, 190 

infrared absorption, 195 

interstitial molecules of water, 194 

liquid water, 190 

melting point, 190 

molar volume, 192 

nuclear magnetic resonance, 632 

nucleic acid, association of 
proteins with, 317 

radial molecular correlation 
function, 193-94 

refinement, 178 

relative permittivity, 190 

rotational correlation time, 195 

scattering of X-radiation, 193 

self-diffusion coefficient, 194 


898 Index 


stretching frequency, 195 
surface tension, 190 
vapor, 190 
viscosity, 193-94 
water constant 
acids and bases, 66 
water in crystallographic molecular 
models, 190-96 
alcohol dehydrogenase, 294 
arginines, 296 
asparagine, 296 
aspartic acid, 296 
Bence-Jones protein, 294 
buried in the interior, 293 
chitinase B, 294 
chloramphenicol O- 
acetyltransferase, 294 
cholesterol oxidase, 295 
conservation, 293 
cytochrome b;g, 293 
cytochrome f, 293, 294 
deoxyribonuclease, 292 
dihydrofolate reductase, 299 
disordered side chains, 296 
disordered water, 290 
fatty-acid-binding protein, 294 
ferredoxin-NADP* reductase, 293 
glutamic acid, 296 
glutamine, 296 
hydrogen bonds, 295 
hydrogen bonds in 
crystallographic molecular 
models, 308 
immune complex, 560 
interleukin 18, 294 
in the interior of integral 
membrane-bound proteins, 
781 
B-lactamase, 293 
location for a molecule of water, 
292 
lysine, 296 
lysozyme, 296, 299 
a-lytic endopeptidase, 294 
networks of water, 295 
nitrate reductase, 293 
peaks of positive electron density, 
291 


penicillopepsin, 293, 295 
phthalate-dioxygenase reductase, 
293 
ribonuclease Up, 292 
ribonuclease T}, 293 
shell of hydration, 296 
thioredoxin, 293 
threonine, 296 
triacylglycerol lipase, 294 
tyrosine, 296 
weighting schemes 
aligning amino acid sequences, 351 
wells of potential energy 
hydrogen bond, 208 
WU-BLAST2 
evaluation of, 368 
WW domains 
folding, 683 
kinetics of folding, 703 


X 
xanthine oxidase 
metalloproteins, 330 
sieving, 427 
X-ray diffraction 
bilayer of phospholipid, 750-51 
oriented helical polymeric 
proteins, 502 
tobacco mosaic virus, 502 
X-ray scattering, 581-83 
aspartate carbamoyltransferase, 
584 
catalase, 583 
chymotrypsinogen, 583 
cyclic AMP-dependent protein 
kinase, 582, 584 
distance distribution function, 581 
fibronectin, 583 
Guinier plot, 581 
hydration, 583 
immunoglobulin M, 584 
myoglobin, 583 
nitrite reductase, 583 
ovalbumin, 583 
small-angle, 581-83 
solution scattering, 582 
solution scattering curve, 
theoretical, 582 


solution scattering curves, 
complete, 582 
spermadhesin, 583 
superoxide dismutase, 583 
troponin C, 583 
tubulin, 583 
B-xylanase 
tautomers, 73 
D-xylose 
structure, 129 
xylose isomerase 
ionic interactions in 
crystallographic molecular 
models, 302 
m-xylylene diisocyanate 
cross-linking, 440 


Y 
yeast 
expression of DNA, 109 
yeast two-hybrid assay 
detection of heterologous 
associations, 518-19 
large screenings, 519 
yield of activity, 21 


Z 
zero net proton charge, point of, 
33 
zero-point energy 
hydrogen bond, 208 
Zimm plot 
light scattering, 416 
zinc 
metalloproteins, 331-32 
zinc finger 
metalloproteins, 326, 331 
nucleic acid, association of 
proteins with, 324-25 
zinc finger protein GLIl 
nucleic acid, association of 
proteins with, 324 
zinc-binding protein TroA 
metalloproteins, 331-32 
zonal chromatography 
definition, 4 


